Require integrity protection for MRTR requestState#3032
Conversation
requestState round-trips through the client, so the spec requires servers to treat the echo as attacker-controlled and reject verification failures whenever the state can influence authorization, resources, or business logic. Add a boundary middleware that seals every outgoing requestState into an authenticated-encrypted claims envelope (TTL, audience, principal, and originating-request binding) and verifies every echo before handlers run; handlers read and write plaintext on every surface. MCPServer requires a protection choice at construction whenever a registration can mint requestState — shared keys, an ephemeral process key, or an explicit opt-out that resolver tools refuse; the low-level tier opts in with one appended middleware. The built-in codec is AES-256-GCM under an HKDF-derived key ring with a key-id lookup, strict token canonicality, and a documented three-phase rotation. Every verification failure answers one frozen -32602 with the real reason in server logs only. The everything-server fixture drops its hand-rolled sealing and passes the tampered-state conformance scenario through the real code path.
Resolver state entries (now v2) record a digest of the exact rendered elicitation per outcome — accepts, declines, and cancels alike. A stored answer is honored only while the live question renders byte-identically; a redeploy that rewords a question or changes its schema re-asks instead of silently reusing a stale answer, and a recorded decline cannot suppress a reworded question. Cryptographic failure stays a wire error; a digest mismatch is a quiet re-ask, since the seal already proved the client honest and the server is what changed.
The startup requirement now applies exactly where the SDK authors the requestState content itself: tools with Resolve(...) parameters, whose state carries elicited answers the server later trusts. Hand-built requestState — a tool, prompt, or resource template returning InputRequiredResult directly — is user-authored, so nothing is required there: an unconfigured server passes it through verbatim, and the docs make protection a clear recommendation instead. Configuring request_state_security= still seals hand-built state on the three carrier methods with no code changes. Consequences: the boundary middleware is installed only when a policy is supplied (no more unconfigured rejection paths), the unprotected() opt-out is deleted (not configuring is the unprotected posture), and the boundary scopes strictly to the carrier methods — requestState- shaped members on custom methods belong to their own protocols and are neither sealed nor policed. All cryptographic hardening, the claims envelope, and resolver question-pinning are unchanged.
📚 Documentation preview
|
There was a problem hiding this comment.
2 issues found across 32 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/mcp/server/mcpserver/resolve.py">
<violation number="1" location="src/mcp/server/mcpserver/resolve.py:533">
P1: Fresh `input_responses` are not checked against the question digest, so an answer to a pre-deploy question can be accepted as answering the changed question. Store the pending question digest in request_state and verify it before accepting/persisting the response.</violation>
</file>
<file name="docs_src/mrtr/tutorial005.py">
<violation number="1" location="docs_src/mrtr/tutorial005.py:27">
P2: Require the `kms1.` prefix before decoding; otherwise deleting the format prefix from a valid token is accepted instead of treated as tampering.</violation>
</file>
Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Fix all with cubic | Re-trigger cubic
| # Persist the exact wire content that just passed validation - never the | ||
| # model - so restoring next round revalidates the same bytes the client sent. | ||
| res.persist[key] = _StateEntry(action="accept", data=answer.content) | ||
| res.persist[key] = _StateEntry(action="accept", data=answer.content, q=q) |
There was a problem hiding this comment.
P1: Fresh input_responses are not checked against the question digest, so an answer to a pre-deploy question can be accepted as answering the changed question. Store the pending question digest in request_state and verify it before accepting/persisting the response.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/mcp/server/mcpserver/resolve.py, line 533:
<comment>Fresh `input_responses` are not checked against the question digest, so an answer to a pre-deploy question can be accepted as answering the changed question. Store the pending question digest in request_state and verify it before accepting/persisting the response.</comment>
<file context>
@@ -521,12 +530,12 @@ async def _elicit(elicit: Elicit[Any], key: str, res: _Resolution) -> Elicitatio
# Persist the exact wire content that just passed validation - never the
# model - so restoring next round revalidates the same bytes the client sent.
- res.persist[key] = _StateEntry(action="accept", data=answer.content)
+ res.persist[key] = _StateEntry(action="accept", data=answer.content, q=q)
return AcceptedElicitation(data=data)
if answer.action == "decline":
</file context>
|
|
||
| def unseal(self, token: str) -> bytes: | ||
| try: | ||
| raw = bytes.fromhex(token.removeprefix(PREFIX)) |
There was a problem hiding this comment.
P2: Require the kms1. prefix before decoding; otherwise deleting the format prefix from a valid token is accepted instead of treated as tampering.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs_src/mrtr/tutorial005.py, line 27:
<comment>Require the `kms1.` prefix before decoding; otherwise deleting the format prefix from a valid token is accepted instead of treated as tampering.</comment>
<file context>
@@ -0,0 +1,33 @@
+
+ def unseal(self, token: str) -> bytes:
+ try:
+ raw = bytes.fromhex(token.removeprefix(PREFIX))
+ return self._aesgcm.decrypt(raw[:12], raw[12:], PREFIX.encode())
+ except (ValueError, InvalidTag) as exc:
</file context>
Comments and docstrings in the new code carried far more prose than they earned. Cut the comment and docstring volume by more than half: docstrings now lead with a single-sentence summary and keep only contracts a reader cannot infer (codec implementer requirements, the key rotation procedure, Raises sections), inline comments are one line and exist only where the code cannot speak for itself, and development narration is gone. Typographic characters in the touched prose were replaced by restructuring the sentences. No executable code changed.
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
src/mcp/server/request_state.py:477-502— RequestStateBoundary._seal calls security.codec.seal() with no try/except, while every other user-supplied callable on the boundary (codec.unseal, bind_principal on both paths) is wrapped in deny-on-error handling — so an exception from a custom codec during seal escapes the middleware chain and JSONRPCDispatcher writes the raw exception text (which for the documented KMS-backed codec can carry key ARNs, endpoints, request IDs) onto the wire on stdio/loop transports. Wrap codec.seal with the same handling as bind_principal on this path (log the traceback, raise MCPError(INTERNAL_ERROR, "Internal error")), and add a seal-side counterpart to the existing ExplodingCodec unseal test.Extended reasoning...
The bug.
RequestStateBoundary._seal(src/mcp/server/request_state.py:502) ends withreturn security.codec.seal(json.dumps(claims, ...).encode())with no exception handling. Every other user-supplied callable on this boundary is deliberately wrapped in deny-on-error handling: the inbound path wrapscodec.unsealin bothexcept InvalidRequestStateand a broadexcept Exception("deny-on-error: a buggy custom codec must fail closed"), andbind_principalis wrapped on both the unseal path (collapses to the frozen-32602error) and the seal path (lines 481-485, converts toMCPError(INTERNAL_ERROR, "Internal error")with the traceback log-only).codec.sealis the single user callable left unwrapped.The code path.
RequestStateSecurity(codec=...)is the documented bring-your-own-crypto surface, and the docs' canonical example is a KMS-backed envelope codec (docs_src/mrtr/tutorial005.py). If such a codec raises duringseal(KMS/network failure, expired credentials, key-policy error — these exception messages routinely contain key ARNs, endpoints, and request IDs), the exception propagates:_seal→_seal_result→RequestStateBoundary.__call__→ the runner's_on_requestmiddleware chain (src/mcp/server/runner.py:217 has no surrounding try/except) →JSONRPCDispatcher's catch-all (src/mcp/shared/jsonrpc_dispatcher.py:714-722), wherehandler_exception_to_error_datareturnsNonefor anything that isn'tMCPError/ValidationError, and the dispatcher writesErrorData(code=0, message=str(e))to the client. On stdio and loop-mode streamable-HTTP transports the raw exception text therefore reaches the wire (the modern HTTP entry sanitizes to "Internal server error", so behavior is also transport-inconsistent).Why nothing else prevents it. The boundary middleware runs outside MCPServer's tool-error-to-CallToolResult conversion, and the runner has no generic catch around the chain, so there is no later sanitization layer. Nothing in the new module catches a seal-side codec failure.
Why this matters for this PR. The PR introduces this call site and explicitly promises the opposite contract — the module docstring,
docs/advanced/multi-round-trip.md("the real reason goes to the server log"), and the PR description ("a buggy or throwing custom codec ... answers the same wire error") all state that codec/principal failures fail closed with the reason logged server-side only. The test suite pins this for the unseal direction (test_a_codec_that_raises_unexpectedly_fails_closed_with_the_frozen_erroruses anExplodingCodecwhoseunsealraises) and for a raisingbind_principalduring seal, but there is no test for a codec whosesealraises — confirming the gap is an oversight rather than a deliberate posture.Step-by-step proof. (1) Construct
MCPServer("deployer", request_state_security=RequestStateSecurity(codec=KmsCodec()))whereKmsCodec.sealcalls KMS per token. (2) A client calls a manual MRTR tool; the handler returnsInputRequiredResult(request_state="round-1"). (3) The boundary's_seal_resultsees the input_required result and calls_seal, which builds the claims envelope and callssecurity.codec.seal(...). (4) KMS is unreachable and the codec raisesbotocore.exceptions.ClientError("An error occurred (AccessDeniedException) when calling the Decrypt operation on key arn:aws:kms:us-east-1:123456789012:key/..."). (5) The exception leavesRequestStateBoundary.__call__, leaves the runner's middleware chain, and lands in the dispatcher's catch-all, which serializesErrorData(code=0, message=str(e))— the ARN-bearing message — into the JSON-RPC error response sent to the client over stdio. Compare the same failure inbind_principalon the same path: it is logged and replaced with a bareMCPError(INTERNAL_ERROR, "Internal error").Fix. Wrap the
security.codec.seal(...)call in_sealwith the same handling already used forbind_principalon this path:logger.exception(...)thenraise MCPError(code=INTERNAL_ERROR, message="Internal error") from None. Add a seal-side test mirroring the existingExplodingCodecunseal test (codec whosesealraises → bare "Internal error" on the wire, traceback only in the server log).
There was a problem hiding this comment.
I didn't find any bugs in this change, but it introduces a new cryptographic surface (AES-GCM request-state sealing, key rotation, principal/audience binding), a construction-time breaking change for resolver tools, and a sizable public API addition — this needs a careful human review rather than auto-approval.
Extended reasoning...
Overview
This PR adds a complete integrity-protection surface for the MRTR requestState: a new ~400-line module src/mcp/server/request_state.py (policy object, AES-256-GCM codec with HKDF key derivation and a rotation ring, claims envelope with TTL/request/principal/audience binding, and a RequestStateBoundary middleware), a construction-time gate in MCPServer requiring request_state_security= for any tool with Resolve(...) parameters, question-digest pinning in the resolver state envelope (v2), new mcp-types helpers, plus extensive docs, examples, and ~1,500 lines of new tests across three suites.
Security risks
The change is itself security machinery, so the risk profile is inherent: token format and canonicality, nonce handling, key-ring lookup via derived fingerprints, claims verification ordering, the deny-on-error broad exception catches around user-supplied codecs and principal callbacks, and the single frozen wire error designed to avoid oracle behavior. The test coverage is unusually thorough (tamper/canonicality/rotation/principal/audience/expiry cases), and I did not find concrete defects, but cryptographic protocol design and the decision to gate only resolver tools (leaving hand-built state unprotected by default) are exactly the kind of choices a maintainer should sign off on rather than a bot.
Level of scrutiny
High. This touches security-critical production code paths (tools/call, prompts/get, resources/read middleware), introduces a breaking change to the public MCPServer constructor contract, and adds several new public names (RequestStateSecurity, RequestStateCodec, AESGCMRequestStateCodec, RequestStateBoundary, InvalidRequestState, authenticated_principal) that will be hard to change later. The opinionated divergence from the typescript-sdk approach (encrypting codec vs sign-only HMAC) is a cross-SDK design decision that warrants maintainer input.
Other factors
There are two unresolved comments from another automated reviewer (a question-digest gap for fresh input_responses, and a prefix-handling nit in the docs example codec) that the author has not yet responded to. The PR is brand new with no human review yet. Given the size, the breaking change, and the security-sensitive nature, deferring to a human is the right call.
The MRTR
requestStateblob round-trips through the client, and the spec requires servers to treat the echoed value as attacker-controlled: integrity-protect it (HMAC or AEAD) and reject verification failures whenever it can influence authorization, resource access, or business logic (basic/patterns/mrtr, server requirements 4 and 5). Today the SDK round-trips it as plaintext (the resolver layer's own comment marks integrity sealing as "a follow-up"), and the conformance fixture passes the tampered-state scenario with hand-rolled fixture crypto rather than anything the SDK ships.This PR makes protection a first-class surface. One deliberate product decision up front: the spec's integrity MUST is conditional, and this SDK is stricter exactly where it authors the state itself. Registering a
Resolve(...)tool refuses to construct until you choose a protection posture, because resolver state carries elicited answers the server will later trust. Hand-builtrequestState(a tool, prompt, or resource template returningInputRequiredResultdirectly) is not gated: the state is user-authored, the docs make protection a clear SHOULD, and configuringrequest_state_security=seals it with the same machinery and zero code changes.What's here
A boundary middleware owns the wire field in both directions. On a configured server,
RequestStateBoundaryacts on the three carrier methods (tools/call,prompts/get,resources/read): it seals every outgoingrequestStateminted there (resolver-produced and hand-built alike) and verifies every inbound echo before any handler or extension interceptor runs. Handlers only ever read and write plaintext (ctx.request_statereturns what they minted).The SDK owns the claims envelope, for every codec. TTL (re-stamped per round, bounding think time), an audience claim (the server name by default, so a token minted by a different service that happens to share a secret is rejected), principal binding (the authenticated OAuth
client_idwhen the SDK validated one; pluggable viabind_principal=), and originating-request binding (method + tool/prompt name or resource URI + an argument digest). A custom codec's only obligations are integrity and, ideally, confidentiality; it structurally cannot lose the replay bounds, which tracks the spec's SHOULD list (principal, short expiry, request identifier) by default.The built-in codec is authenticated encryption with a rotation story. AES-256-GCM under HKDF-derived keys (
cryptographyis already a dependency), a 4-byte derived key fingerprint for O(1) ring lookup, the format prefix and key id bound under the GCM tag (the token never names its algorithm), strict base64url canonicality (every byte of the wire form is authenticated, so there is no trailing-character malleability), andkeys=[...]as the rotation ring:keys[0]seals, every key verifies, with a documented three-phase zero-downtime rotation. This deliberately diverges from typescript-sdk's sign-only HMAC codec:requestStatenever legitimately crosses SDK implementations, encryption removes the permanent "don't put secrets in it" caveat, and their codec ships without rotation.Construction-time gate with a teaching error. Registering a
Resolve(...)tool withoutrequest_state_security=raises at the registration call, before any client connects, with an error that lays out the choices:The gate covers every tool-registration funnel, including constructor-supplied
tools=[...]and hand-builtToolobjects. There is nounprotected()opt-out and no gate on manual flows: not configuring is the unprotected posture, which the spec permits only when tampering can cause nothing worse than a failed request. The docs spell out that running there means satisfying the spec's integrity requirement yourself. The low-levelServerkeeps its no-batteries contract: nothing is required, and one appended middleware line provides identical enforcement.Resolver answers are pinned to the exact question asked. The resolver state envelope (now v2) records a digest of the rendered elicitation per outcome: accepts, declines, and cancels. A redeploy that rewords a question or changes its schema re-asks instead of silently reusing a stale answer. The failure split is deliberate: cryptographic failure is the frozen wire error (the echo was forged or expired), digest mismatch is a quiet re-ask (the seal already proved the client honest; the server is what changed).
One frozen failure signature. Every inbound verification failure (tampered, expired, future-minted, replayed cross-request/cross-principal/cross-service, unknown key, malformed, a buggy or throwing custom codec or principal callback) answers the same wire error (
-32602,"Invalid or expired requestState",data.reason "invalid_request_state", byte-matching typescript-sdk's), with the real reason in server logs only. The error shape never reveals which check failed.Motivation and context
Without this, every resolver flow trusts client-echoed state for replay across users, tools, and calls (nothing binds it to a session, principal, request, or time), and hand-built MRTR state is unprotected unless each author rolls their own crypto. The resolver-tool startup requirement is the part that goes beyond the spec text; everything inside the envelope is the spec's own recommended replay-prevention list, on by default.
How has this been tested?
TDD throughout: the security suite (80 tests at first writing; 104 in the final suite) was written and verified red before the module existed, and caught three defects in the design before implementation (an unseal floor off-by-one, a self-contradictory expiry comparison, sub-second TTL truncation). The result was then put through an adversarial review (six attacker lenses plus an eight-angle code review, every finding refute-verified with runnable probes), and the confirmed findings are fixed and pinned in tests, including: strict token canonicality (63-variant flip tests), non-finite TTL rejection, key-material type enforcement, explicit-null
requestStatetreated as absent (spec-preferred re-elicit recovery), non-string claims collapsing to the frozen error, throwingbind_principalfailing closed, and audience separation between services.tests/server/test_request_state.py,test_request_state_boundary.py,test_request_state_gate.py: codec properties, claims-envelope behavior per field (drift in either direction fails closed), wire-level seal/restore on both server tiers, the gate across every tool-registration funnel including the constructortools=[...]bypass and hand-builtToolobjects.tests/server/mcpserver/test_resolve.py: question-digest pinning (including the forged-state regression tests upgraded to prove a stronger property: a boundary-authenticated, decodable forged entry is still never consulted over a resolver's own computation).input-required-result-tampered-statethrough the production boundary; its hand-rolled HMAC helpers are deleted.mrtrexample story demonstrates the full surface live, including a tamper round asserting the frozen error on the wire while the reason stays in the server log.Breaking changes
Yes, pre-1.0/v2-line breaks, both with teaching errors and migration docs:
Resolve(...)parameters requirerequest_state_security=at construction.requestStateis an opaque sealed token;ctx.request_statestill returns the handler's plaintext.docs/migration.mdcovers both;docs/advanced/multi-round-trip.mdgains the full security section (choices, rotation, custom codecs, failure behavior).Types of changes
Checklist
Additional context
RequestStateSecurity(the policy: which codec, what TTL, which principal, which audience),RequestStateCodec(a two-method protocol:seal/unsealover opaque bytes),AESGCMRequestStateCodec(the built-in),RequestStateBoundary(the middleware, public so the low-level tier composes the identical object),InvalidRequestState(the codec failure exception),authenticated_principal(the default principal source). Re-exported frommcp.server.mcpserverbesideResourceSecurity.INPUT_REQUIRED_METHODSand theis_input_requiredpredicate live inmcp-typesand are derived from the method/result tables (with a weld test), so "which methods carryinput_required" has one source of truth: a future spec method joining the set is sealed automatically rather than failing open.test_request_state*files rather than one (the suite is 104 tests; the shared prefix keeps the source-tree mirror discoverable), and the boundary's deny-on-error around user-supplied codec/principal callables uses a broad exception catch. That catch is the trust boundary the spec's reject-MUST requires to fail closed, and it logs the real failure rather than swallowing it.bind_principal=supplies an identity; elicitation messages must render deterministically across body re-runs or recorded answers will always look stale (documented in the resolver tutorial); hand-built state on unconfigured servers is intentionally untouched (the spec's conditional MUST is the operator's to satisfy there, and the docs say so).AI Disclaimer