Skip to content

Require integrity protection for MRTR requestState#3032

Open
maxisbey wants to merge 4 commits into
mainfrom
request-state-signing
Open

Require integrity protection for MRTR requestState#3032
maxisbey wants to merge 4 commits into
mainfrom
request-state-signing

Conversation

@maxisbey

@maxisbey maxisbey commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

The MRTR requestState blob round-trips through the client, and the spec requires servers to treat the echoed value as attacker-controlled: integrity-protect it (HMAC or AEAD) and reject verification failures whenever it can influence authorization, resource access, or business logic (basic/patterns/mrtr, server requirements 4 and 5). Today the SDK round-trips it as plaintext (the resolver layer's own comment marks integrity sealing as "a follow-up"), and the conformance fixture passes the tampered-state scenario with hand-rolled fixture crypto rather than anything the SDK ships.

This PR makes protection a first-class surface. One deliberate product decision up front: the spec's integrity MUST is conditional, and this SDK is stricter exactly where it authors the state itself. Registering a Resolve(...) tool refuses to construct until you choose a protection posture, because resolver state carries elicited answers the server will later trust. Hand-built requestState (a tool, prompt, or resource template returning InputRequiredResult directly) is not gated: the state is user-authored, the docs make protection a clear SHOULD, and configuring request_state_security= seals it with the same machinery and zero code changes.

What's here

A boundary middleware owns the wire field in both directions. On a configured server, RequestStateBoundary acts on the three carrier methods (tools/call, prompts/get, resources/read): it seals every outgoing requestState minted there (resolver-produced and hand-built alike) and verifies every inbound echo before any handler or extension interceptor runs. Handlers only ever read and write plaintext (ctx.request_state returns what they minted).

The SDK owns the claims envelope, for every codec. TTL (re-stamped per round, bounding think time), an audience claim (the server name by default, so a token minted by a different service that happens to share a secret is rejected), principal binding (the authenticated OAuth client_id when the SDK validated one; pluggable via bind_principal=), and originating-request binding (method + tool/prompt name or resource URI + an argument digest). A custom codec's only obligations are integrity and, ideally, confidentiality; it structurally cannot lose the replay bounds, which tracks the spec's SHOULD list (principal, short expiry, request identifier) by default.

The built-in codec is authenticated encryption with a rotation story. AES-256-GCM under HKDF-derived keys (cryptography is already a dependency), a 4-byte derived key fingerprint for O(1) ring lookup, the format prefix and key id bound under the GCM tag (the token never names its algorithm), strict base64url canonicality (every byte of the wire form is authenticated, so there is no trailing-character malleability), and keys=[...] as the rotation ring: keys[0] seals, every key verifies, with a documented three-phase zero-downtime rotation. This deliberately diverges from typescript-sdk's sign-only HMAC codec: requestState never legitimately crosses SDK implementations, encryption removes the permanent "don't put secrets in it" caveat, and their codec ships without rotation.

Construction-time gate with a teaching error. Registering a Resolve(...) tool without request_state_security= raises at the registration call, before any client connects, with an error that lays out the choices:

MCPServer(..., request_state_security=RequestStateSecurity(keys=[key]))   # shared-key fleet
MCPServer(..., request_state_security=RequestStateSecurity.ephemeral())   # single process

The gate covers every tool-registration funnel, including constructor-supplied tools=[...] and hand-built Tool objects. There is no unprotected() opt-out and no gate on manual flows: not configuring is the unprotected posture, which the spec permits only when tampering can cause nothing worse than a failed request. The docs spell out that running there means satisfying the spec's integrity requirement yourself. The low-level Server keeps its no-batteries contract: nothing is required, and one appended middleware line provides identical enforcement.

Resolver answers are pinned to the exact question asked. The resolver state envelope (now v2) records a digest of the rendered elicitation per outcome: accepts, declines, and cancels. A redeploy that rewords a question or changes its schema re-asks instead of silently reusing a stale answer. The failure split is deliberate: cryptographic failure is the frozen wire error (the echo was forged or expired), digest mismatch is a quiet re-ask (the seal already proved the client honest; the server is what changed).

One frozen failure signature. Every inbound verification failure (tampered, expired, future-minted, replayed cross-request/cross-principal/cross-service, unknown key, malformed, a buggy or throwing custom codec or principal callback) answers the same wire error (-32602, "Invalid or expired requestState", data.reason "invalid_request_state", byte-matching typescript-sdk's), with the real reason in server logs only. The error shape never reveals which check failed.

Motivation and context

Without this, every resolver flow trusts client-echoed state for replay across users, tools, and calls (nothing binds it to a session, principal, request, or time), and hand-built MRTR state is unprotected unless each author rolls their own crypto. The resolver-tool startup requirement is the part that goes beyond the spec text; everything inside the envelope is the spec's own recommended replay-prevention list, on by default.

How has this been tested?

TDD throughout: the security suite (80 tests at first writing; 104 in the final suite) was written and verified red before the module existed, and caught three defects in the design before implementation (an unseal floor off-by-one, a self-contradictory expiry comparison, sub-second TTL truncation). The result was then put through an adversarial review (six attacker lenses plus an eight-angle code review, every finding refute-verified with runnable probes), and the confirmed findings are fixed and pinned in tests, including: strict token canonicality (63-variant flip tests), non-finite TTL rejection, key-material type enforcement, explicit-null requestState treated as absent (spec-preferred re-elicit recovery), non-string claims collapsing to the frozen error, throwing bind_principal failing closed, and audience separation between services.

  • tests/server/test_request_state.py, test_request_state_boundary.py, test_request_state_gate.py: codec properties, claims-envelope behavior per field (drift in either direction fails closed), wire-level seal/restore on both server tiers, the gate across every tool-registration funnel including the constructor tools=[...] bypass and hand-built Tool objects.
  • tests/server/mcpserver/test_resolve.py: question-digest pinning (including the forged-state regression tests upgraded to prove a stronger property: a boundary-authenticated, decodable forged entry is still never consulted over a resolver's own computation).
  • Touched modules at 100% line+branch coverage with no new suppressions; full suite, pyright strict, and ruff green.
  • All six conformance legs pass at the existing baselines, and the everything-server fixture now passes input-required-result-tampered-state through the production boundary; its hand-rolled HMAC helpers are deleted.
  • The mrtr example story demonstrates the full surface live, including a tamper round asserting the frozen error on the wire while the reason stays in the server log.

Breaking changes

Yes, pre-1.0/v2-line breaks, both with teaching errors and migration docs:

  • Tools with Resolve(...) parameters require request_state_security= at construction.
  • On protected servers the wire requestState is an opaque sealed token; ctx.request_state still returns the handler's plaintext.

docs/migration.md covers both; docs/advanced/multi-round-trip.md gains the full security section (choices, rotation, custom codecs, failure behavior).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

  • Every public name pays rent: RequestStateSecurity (the policy: which codec, what TTL, which principal, which audience), RequestStateCodec (a two-method protocol: seal/unseal over opaque bytes), AESGCMRequestStateCodec (the built-in), RequestStateBoundary (the middleware, public so the low-level tier composes the identical object), InvalidRequestState (the codec failure exception), authenticated_principal (the default principal source). Re-exported from mcp.server.mcpserver beside ResourceSecurity.
  • INPUT_REQUIRED_METHODS and the is_input_required predicate live in mcp-types and are derived from the method/result tables (with a weld test), so "which methods carry input_required" has one source of truth: a future spec method joining the set is sealed automatically rather than failing open.
  • Two deviations from repo conventions, made deliberately: the tests for the new module span three test_request_state* files rather than one (the suite is 104 tests; the shared prefix keeps the source-tree mirror discoverable), and the boundary's deny-on-error around user-supplied codec/principal callables uses a broad exception catch. That catch is the trust boundary the spec's reject-MUST requires to fail closed, and it logs the real failure rather than swallowing it.
  • Known limits, documented rather than hidden: replay of an unexpired token against the byte-identical original request is bounded, not prevented (per spec, at-most-once semantics remain a server-side concern; the resolver layer is replay-safe by construction since a resolver's own computation always wins); TTL re-stamps each round, so it bounds per-round think time, not total flow duration; principal binding is inert when authentication terminates outside the SDK unless bind_principal= supplies an identity; elicitation messages must render deterministically across body re-runs or recorded answers will always look stale (documented in the resolver tutorial); hand-built state on unconfigured servers is intentionally untouched (the spec's conditional MUST is the operator's to satisfy there, and the docs say so).

AI Disclaimer

maxisbey added 3 commits June 30, 2026 15:19
requestState round-trips through the client, so the spec requires servers
to treat the echo as attacker-controlled and reject verification failures
whenever the state can influence authorization, resources, or business
logic. Add a boundary middleware that seals every outgoing requestState
into an authenticated-encrypted claims envelope (TTL, audience, principal,
and originating-request binding) and verifies every echo before handlers
run; handlers read and write plaintext on every surface. MCPServer
requires a protection choice at construction whenever a registration can
mint requestState — shared keys, an ephemeral process key, or an explicit
opt-out that resolver tools refuse; the low-level tier opts in with one
appended middleware. The built-in codec is AES-256-GCM under an
HKDF-derived key ring with a key-id lookup, strict token canonicality,
and a documented three-phase rotation. Every verification failure answers
one frozen -32602 with the real reason in server logs only. The
everything-server fixture drops its hand-rolled sealing and passes the
tampered-state conformance scenario through the real code path.
Resolver state entries (now v2) record a digest of the exact rendered
elicitation per outcome — accepts, declines, and cancels alike. A stored
answer is honored only while the live question renders byte-identically;
a redeploy that rewords a question or changes its schema re-asks instead
of silently reusing a stale answer, and a recorded decline cannot
suppress a reworded question. Cryptographic failure stays a wire error;
a digest mismatch is a quiet re-ask, since the seal already proved the
client honest and the server is what changed.
The startup requirement now applies exactly where the SDK authors the
requestState content itself: tools with Resolve(...) parameters, whose
state carries elicited answers the server later trusts. Hand-built
requestState — a tool, prompt, or resource template returning
InputRequiredResult directly — is user-authored, so nothing is required
there: an unconfigured server passes it through verbatim, and the docs
make protection a clear recommendation instead. Configuring
request_state_security= still seals hand-built state on the three
carrier methods with no code changes.

Consequences: the boundary middleware is installed only when a policy
is supplied (no more unconfigured rejection paths), the unprotected()
opt-out is deleted (not configuring is the unprotected posture), and
the boundary scopes strictly to the carrier methods — requestState-
shaped members on custom methods belong to their own protocols and are
neither sealed nor policed. All cryptographic hardening, the claims
envelope, and resolver question-pinning are unchanged.
@maxisbey maxisbey marked this pull request as ready for review June 30, 2026 16:00
@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

📚 Documentation preview

Preview https://pr-3032.mcp-python-docs.pages.dev
Deployment https://4862ad0e.mcp-python-docs.pages.dev
Commit 6c0b2ab
Triggered by @maxisbey
Updated 2026-06-30 16:17:11 UTC

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 32 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/mcp/server/mcpserver/resolve.py">

<violation number="1" location="src/mcp/server/mcpserver/resolve.py:533">
P1: Fresh `input_responses` are not checked against the question digest, so an answer to a pre-deploy question can be accepted as answering the changed question. Store the pending question digest in request_state and verify it before accepting/persisting the response.</violation>
</file>

<file name="docs_src/mrtr/tutorial005.py">

<violation number="1" location="docs_src/mrtr/tutorial005.py:27">
P2: Require the `kms1.` prefix before decoding; otherwise deleting the format prefix from a valid token is accepted instead of treated as tampering.</violation>
</file>

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.

Fix all with cubic | Re-trigger cubic

# Persist the exact wire content that just passed validation - never the
# model - so restoring next round revalidates the same bytes the client sent.
res.persist[key] = _StateEntry(action="accept", data=answer.content)
res.persist[key] = _StateEntry(action="accept", data=answer.content, q=q)

@cubic-dev-ai cubic-dev-ai Bot Jun 30, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Fresh input_responses are not checked against the question digest, so an answer to a pre-deploy question can be accepted as answering the changed question. Store the pending question digest in request_state and verify it before accepting/persisting the response.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/mcp/server/mcpserver/resolve.py, line 533:

<comment>Fresh `input_responses` are not checked against the question digest, so an answer to a pre-deploy question can be accepted as answering the changed question. Store the pending question digest in request_state and verify it before accepting/persisting the response.</comment>

<file context>
@@ -521,12 +530,12 @@ async def _elicit(elicit: Elicit[Any], key: str, res: _Resolution) -> Elicitatio
         # Persist the exact wire content that just passed validation - never the
         # model - so restoring next round revalidates the same bytes the client sent.
-        res.persist[key] = _StateEntry(action="accept", data=answer.content)
+        res.persist[key] = _StateEntry(action="accept", data=answer.content, q=q)
         return AcceptedElicitation(data=data)
     if answer.action == "decline":
</file context>
Fix with cubic


def unseal(self, token: str) -> bytes:
try:
raw = bytes.fromhex(token.removeprefix(PREFIX))

@cubic-dev-ai cubic-dev-ai Bot Jun 30, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Require the kms1. prefix before decoding; otherwise deleting the format prefix from a valid token is accepted instead of treated as tampering.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs_src/mrtr/tutorial005.py, line 27:

<comment>Require the `kms1.` prefix before decoding; otherwise deleting the format prefix from a valid token is accepted instead of treated as tampering.</comment>

<file context>
@@ -0,0 +1,33 @@
+
+    def unseal(self, token: str) -> bytes:
+        try:
+            raw = bytes.fromhex(token.removeprefix(PREFIX))
+            return self._aesgcm.decrypt(raw[:12], raw[12:], PREFIX.encode())
+        except (ValueError, InvalidTag) as exc:
</file context>
Fix with cubic

Comments and docstrings in the new code carried far more prose than
they earned. Cut the comment and docstring volume by more than half:
docstrings now lead with a single-sentence summary and keep only
contracts a reader cannot infer (codec implementer requirements, the
key rotation procedure, Raises sections), inline comments are one line
and exist only where the code cannot speak for itself, and development
narration is gone. Typographic characters in the touched prose were
replaced by restructuring the sentences. No executable code changed.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🔴 src/mcp/server/request_state.py:477-502 — RequestStateBoundary._seal calls security.codec.seal() with no try/except, while every other user-supplied callable on the boundary (codec.unseal, bind_principal on both paths) is wrapped in deny-on-error handling — so an exception from a custom codec during seal escapes the middleware chain and JSONRPCDispatcher writes the raw exception text (which for the documented KMS-backed codec can carry key ARNs, endpoints, request IDs) onto the wire on stdio/loop transports. Wrap codec.seal with the same handling as bind_principal on this path (log the traceback, raise MCPError(INTERNAL_ERROR, "Internal error")), and add a seal-side counterpart to the existing ExplodingCodec unseal test.

    Extended reasoning...

    The bug. RequestStateBoundary._seal (src/mcp/server/request_state.py:502) ends with return security.codec.seal(json.dumps(claims, ...).encode()) with no exception handling. Every other user-supplied callable on this boundary is deliberately wrapped in deny-on-error handling: the inbound path wraps codec.unseal in both except InvalidRequestState and a broad except Exception ("deny-on-error: a buggy custom codec must fail closed"), and bind_principal is wrapped on both the unseal path (collapses to the frozen -32602 error) and the seal path (lines 481-485, converts to MCPError(INTERNAL_ERROR, "Internal error") with the traceback log-only). codec.seal is the single user callable left unwrapped.

    The code path. RequestStateSecurity(codec=...) is the documented bring-your-own-crypto surface, and the docs' canonical example is a KMS-backed envelope codec (docs_src/mrtr/tutorial005.py). If such a codec raises during seal (KMS/network failure, expired credentials, key-policy error — these exception messages routinely contain key ARNs, endpoints, and request IDs), the exception propagates: _seal_seal_resultRequestStateBoundary.__call__ → the runner's _on_request middleware chain (src/mcp/server/runner.py:217 has no surrounding try/except) → JSONRPCDispatcher's catch-all (src/mcp/shared/jsonrpc_dispatcher.py:714-722), where handler_exception_to_error_data returns None for anything that isn't MCPError/ValidationError, and the dispatcher writes ErrorData(code=0, message=str(e)) to the client. On stdio and loop-mode streamable-HTTP transports the raw exception text therefore reaches the wire (the modern HTTP entry sanitizes to "Internal server error", so behavior is also transport-inconsistent).

    Why nothing else prevents it. The boundary middleware runs outside MCPServer's tool-error-to-CallToolResult conversion, and the runner has no generic catch around the chain, so there is no later sanitization layer. Nothing in the new module catches a seal-side codec failure.

    Why this matters for this PR. The PR introduces this call site and explicitly promises the opposite contract — the module docstring, docs/advanced/multi-round-trip.md ("the real reason goes to the server log"), and the PR description ("a buggy or throwing custom codec ... answers the same wire error") all state that codec/principal failures fail closed with the reason logged server-side only. The test suite pins this for the unseal direction (test_a_codec_that_raises_unexpectedly_fails_closed_with_the_frozen_error uses an ExplodingCodec whose unseal raises) and for a raising bind_principal during seal, but there is no test for a codec whose seal raises — confirming the gap is an oversight rather than a deliberate posture.

    Step-by-step proof. (1) Construct MCPServer("deployer", request_state_security=RequestStateSecurity(codec=KmsCodec())) where KmsCodec.seal calls KMS per token. (2) A client calls a manual MRTR tool; the handler returns InputRequiredResult(request_state="round-1"). (3) The boundary's _seal_result sees the input_required result and calls _seal, which builds the claims envelope and calls security.codec.seal(...). (4) KMS is unreachable and the codec raises botocore.exceptions.ClientError("An error occurred (AccessDeniedException) when calling the Decrypt operation on key arn:aws:kms:us-east-1:123456789012:key/..."). (5) The exception leaves RequestStateBoundary.__call__, leaves the runner's middleware chain, and lands in the dispatcher's catch-all, which serializes ErrorData(code=0, message=str(e)) — the ARN-bearing message — into the JSON-RPC error response sent to the client over stdio. Compare the same failure in bind_principal on the same path: it is logged and replaced with a bare MCPError(INTERNAL_ERROR, "Internal error").

    Fix. Wrap the security.codec.seal(...) call in _seal with the same handling already used for bind_principal on this path: logger.exception(...) then raise MCPError(code=INTERNAL_ERROR, message="Internal error") from None. Add a seal-side test mirroring the existing ExplodingCodec unseal test (codec whose seal raises → bare "Internal error" on the wire, traceback only in the server log).

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs in this change, but it introduces a new cryptographic surface (AES-GCM request-state sealing, key rotation, principal/audience binding), a construction-time breaking change for resolver tools, and a sizable public API addition — this needs a careful human review rather than auto-approval.

Extended reasoning...

Overview

This PR adds a complete integrity-protection surface for the MRTR requestState: a new ~400-line module src/mcp/server/request_state.py (policy object, AES-256-GCM codec with HKDF key derivation and a rotation ring, claims envelope with TTL/request/principal/audience binding, and a RequestStateBoundary middleware), a construction-time gate in MCPServer requiring request_state_security= for any tool with Resolve(...) parameters, question-digest pinning in the resolver state envelope (v2), new mcp-types helpers, plus extensive docs, examples, and ~1,500 lines of new tests across three suites.

Security risks

The change is itself security machinery, so the risk profile is inherent: token format and canonicality, nonce handling, key-ring lookup via derived fingerprints, claims verification ordering, the deny-on-error broad exception catches around user-supplied codecs and principal callbacks, and the single frozen wire error designed to avoid oracle behavior. The test coverage is unusually thorough (tamper/canonicality/rotation/principal/audience/expiry cases), and I did not find concrete defects, but cryptographic protocol design and the decision to gate only resolver tools (leaving hand-built state unprotected by default) are exactly the kind of choices a maintainer should sign off on rather than a bot.

Level of scrutiny

High. This touches security-critical production code paths (tools/call, prompts/get, resources/read middleware), introduces a breaking change to the public MCPServer constructor contract, and adds several new public names (RequestStateSecurity, RequestStateCodec, AESGCMRequestStateCodec, RequestStateBoundary, InvalidRequestState, authenticated_principal) that will be hard to change later. The opinionated divergence from the typescript-sdk approach (encrypting codec vs sign-only HMAC) is a cross-SDK design decision that warrants maintainer input.

Other factors

There are two unresolved comments from another automated reviewer (a question-digest gap for fresh input_responses, and a prefix-handling nit in the docs example codec) that the author has not yet responded to. The PR is brand new with no human review yet. Given the size, the breaking change, and the security-sensitive nature, deferring to a human is the right call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant