Skip to content

Python: Normalize OpenAI function-call arguments at parse time to prevent uni…#4831

Open
0x7c13 wants to merge 1 commit intomicrosoft:mainfrom
0x7c13:dev/0x7c13/unicode_corruption_fix
Open

Python: Normalize OpenAI function-call arguments at parse time to prevent uni…#4831
0x7c13 wants to merge 1 commit intomicrosoft:mainfrom
0x7c13:dev/0x7c13/unicode_corruption_fix

Conversation

@0x7c13
Copy link

@0x7c13 0x7c13 commented Mar 22, 2026

Python: Normalize OpenAI function-call arguments at parse time to prevent unicode escape corruption

Problem

When an LLM-powered agent edits source files containing Python/JavaScript unicode escape sequences like \u2192, the OpenAI code path corrupts these sequences due to double JSON parsing.

Root cause

The Anthropic and OpenAI backends handle function-call arguments differently:

  • Anthropic: Returns content_block.input as a parsed dict. Stored directly — parse_arguments() returns it as-is. 1 JSON parse total.
  • OpenAI: Returns tool.function.arguments as a raw JSON string. Stored as a string, then parse_arguments() calls json.loads() again. 2 JSON parses total.

The second json.loads() re-interprets \uXXXX sequences as JSON unicode escapes, corrupting the original intent:

# A source file contains the Python escape: \u2192
# The model correctly generates \\u2192 in its JSON arguments

# Anthropic path (1 parse):
content_block.input = {"old_string": "\\u2192"}  # SDK parsed → \u2192 ✓

# OpenAI path (2 parses):
tool.function.arguments = '{"old_string": "\\u2192"}'  # stored as string
json.loads(arguments)    → {"old_string": "→"}          # \u2192 interpreted as unicode escape ✗

The same model output that works correctly on Anthropic produces a corrupted value on OpenAI. The \u2192 (literal 6-char Python escape) becomes (a single Unicode character), causing edit_file to either fail to match or write incorrect content.

Impact

This affects any tool that reads/writes source code containing \uXXXX escape sequences (Python, JavaScript, Java, C#, JSON). In practice, agents enter retry loops (10+ failed edit_file attempts observed) trying different escaping levels, wasting tokens and often ultimately writing corrupted code.

What changed

  • Added normalize_function_call_arguments() helper in _types.py that eagerly parses JSON-string arguments into dicts at the provider-parsing layer
  • Applied normalization in OpenAIChatClient._parse_tool_calls_from_openai() and three non-streaming parse sites in OpenAIResponsesClient
  • Updated _prepare_content_for_openai() in the responses client to re-serialize dict arguments back to JSON strings when sending to the API (the chat client already handled this at line 704)
  • Updated 2 test assertions that expected raw string arguments to expect parsed dicts

Streaming deltas (response.function_call_arguments.delta) are intentionally not normalized since they contain partial JSON fragments.

Validation

uv run python -m pytest packages/core/tests/openai/test_openai_chat_client.py \
  packages/core/tests/openai/test_openai_responses_client.py \
  -m "not integration" -q

All 183 tests pass.

Before / After comparison

from agent_framework._types import normalize_function_call_arguments

# Model generates \\u2192 in its JSON output — the correct escaping for literal \u2192
args = '{"old_string": "\\\\u2192"}'

# BEFORE: stored as string, then double-parsed
import json
json.loads(args)["old_string"]   # → '\\u2192' (2 backslashes — wrong)

# AFTER: normalized once at parse time, parse_arguments() returns dict directly
normalize_function_call_arguments(args)["old_string"]  # → '\\u2192' (same parse)
# Then parse_arguments() sees a Mapping and returns it — no second json.loads

The fix makes the OpenAI path behave identically to the Anthropic path: arguments are parsed once and stored as a dict. parse_arguments() returns the dict directly without a second json.loads() call.

Related

@github-actions github-actions bot changed the title Normalize OpenAI function-call arguments at parse time to prevent uni… Python: Normalize OpenAI function-call arguments at parse time to prevent uni… Mar 22, 2026
@0x7c13
Copy link
Author

0x7c13 commented Mar 22, 2026

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants