Python: Normalize OpenAI function-call arguments at parse time to prevent uni…#4831
Open
0x7c13 wants to merge 1 commit intomicrosoft:mainfrom
Open
Python: Normalize OpenAI function-call arguments at parse time to prevent uni…#48310x7c13 wants to merge 1 commit intomicrosoft:mainfrom
0x7c13 wants to merge 1 commit intomicrosoft:mainfrom
Conversation
…code escape corruption
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Python: Normalize OpenAI function-call arguments at parse time to prevent unicode escape corruption
Problem
When an LLM-powered agent edits source files containing Python/JavaScript unicode escape sequences like
\u2192, the OpenAI code path corrupts these sequences due to double JSON parsing.Root cause
The Anthropic and OpenAI backends handle function-call arguments differently:
content_block.inputas a parsed dict. Stored directly —parse_arguments()returns it as-is. 1 JSON parse total.tool.function.argumentsas a raw JSON string. Stored as a string, thenparse_arguments()callsjson.loads()again. 2 JSON parses total.The second
json.loads()re-interprets\uXXXXsequences as JSON unicode escapes, corrupting the original intent:The same model output that works correctly on Anthropic produces a corrupted value on OpenAI. The
\u2192(literal 6-char Python escape) becomes→(a single Unicode character), causingedit_fileto either fail to match or write incorrect content.Impact
This affects any tool that reads/writes source code containing
\uXXXXescape sequences (Python, JavaScript, Java, C#, JSON). In practice, agents enter retry loops (10+ failededit_fileattempts observed) trying different escaping levels, wasting tokens and often ultimately writing corrupted code.What changed
normalize_function_call_arguments()helper in_types.pythat eagerly parses JSON-string arguments into dicts at the provider-parsing layerOpenAIChatClient._parse_tool_calls_from_openai()and three non-streaming parse sites inOpenAIResponsesClient_prepare_content_for_openai()in the responses client to re-serialize dict arguments back to JSON strings when sending to the API (the chat client already handled this at line 704)Streaming deltas (
response.function_call_arguments.delta) are intentionally not normalized since they contain partial JSON fragments.Validation
uv run python -m pytest packages/core/tests/openai/test_openai_chat_client.py \ packages/core/tests/openai/test_openai_responses_client.py \ -m "not integration" -qAll 183 tests pass.
Before / After comparison
The fix makes the OpenAI path behave identically to the Anthropic path: arguments are parsed once and stored as a dict.
parse_arguments()returns the dict directly without a secondjson.loads()call.Related