Truncate untrusted peer-controlled values before logging/raising#2238
Draft
Truncate untrusted peer-controlled values before logging/raising#2238
Conversation
Follow-up to 3041fd0 which truncated a client-controlled session ID. A sweep of the codebase found ~30 more locations where values controlled by the other side of the connection are interpolated unbounded into log messages, exception messages, or HTTP error responses. Server-side (client-controlled input): - sse.py: session_id query param (direct analog of 3041fd0) - transport_security.py: Host/Origin headers at WARNING - mcpserver: tool names, prompt names, resource URIs in 'Unknown X' errors and logger.exception calls - streamable_http.py: mcp-protocol-version header echoed in 400 body - auth handlers: client_id, scope, redirect_uri echoed in error responses - task handlers: task_id, pagination cursor in error messages Client-side (server-controlled input): - streamable_http.py: mcp-session-id response header, content-type header, raw InitializeResult dict - session.py: protocol_version from InitializeResult - session_group.py: MCPError str (= server's error.message verbatim) - sse.py/streamable_http.py: SSE event names Protocol-level (shared/session.py): - error.message from null-ID JSONRPCError - Response ID in 'cannot be normalized' warning - Unknown-request-ID error: was logging entire SessionMessage repr (full wire payload), now logs just the truncated ID - Dropped full JSONRPCNotification repr from validation-failure warning Also removed three debug logs in server/sse.py that stringified the full request body / parsed message on every POST via eager f-string evaluation. Truncation lengths: 32 for version strings, 64 for IDs/tokens, 128 for names/headers, 256 for URIs/messages, 512 for result dicts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up sweep after 3041fd0 (which truncated a client-controlled session ID before logging). Found ~30 more locations where values controlled by the peer are interpolated unbounded into log messages, exception messages, or HTTP error responses.
Motivation and Context
A peer (client→server or server→client) can send an arbitrarily large value in a header, query param, JSON-RPC field, or OAuth parameter. When that value lands in a
logger.warning(f"...{value}")orraise Error(f"...{value}"), it can:Example:
POST /messages/?session_id=<5MB of non-hex garbage>→UUID()raisesValueError→logger.warning(f"Received invalid session ID: {session_id_param}")writes 5MB to logs.What's changed
Server-side (client-controlled):
sse.pysession_id query param ·transport_security.pyHost/Origin headers ·mcpservertool/prompt names + resource URIs in "Unknown X" errors ·streamable_http.pyprotocol-version header · auth handlersclient_id/scope/redirect_uri· task handlerstask_id/cursorClient-side (server-controlled):
streamable_http.pysession-id response header, content-type, raw result dict ·session.pyprotocol_version ·session_group.pyMCPError str · SSE event namesProtocol-level (
shared/session.py): null-ID error.message · unnormalizable response ID · unknown-request-ID error (was dumping entireSessionMessagerepr = full wire payload, now logs just the ID) · dropped fullJSONRPCNotificationrepr from validation-failure warningAlso removed three debug logs in
server/sse.pythat stringified the full POST body/parsed message on every request via eager f-string eval (CPU waste even with DEBUG off).Truncation lengths: 32 for version strings, 64 for IDs/tokens, 128 for names/headers, 256 for URIs/messages, 512 for result dicts.
How Has This Been Tested?
Existing test suite passes (1126 tests). No new tests added — truncation doesn't introduce new branches and the limits are well above any legitimate value.
Breaking Changes
None. Error/log messages change slightly (truncated), but no API surface changes.
Types of changes
Checklist
Additional context
Deferred for separate review (require design decisions, not mechanical truncation):
shared/session.py:361exc_info=True→ PydanticValidationErrortraceback echoesinput_value=. Can't truncate a traceback.server/auth/errors.py:5stringify_pydantic_errorusese['msg']— for discriminated-union errors, Pydantic puts the full tag verbatim inmsg(verified: 1000-chargrant_type→ 1116-char error). Shared helper.authorize.pystatestatereturned unchanged. Fix belongs in a Pydanticmax_lengthfield constraint, not response truncation.client/auth/utils.py:225,oauth2.py:410response.text/body inOAuthRegistrationError/OAuthTokenError. Only debug info user gets on failed token exchange — worth a length discussion.client/session.py:345,347jsonschema.ValidationError.__str__includes full failing instance. Could switch toe.message(one-liner).MCPError.__str__,SessionMessage.__repr__,RequestIdmax_lengthMCPError.__str__returns wireerror.messageverbatim,SessionMessagedataclass repr includes full nested payload,RequestId = ...|strhas no length cap. Behavior/schema changes.