Fix incorrect Content-Length for StringIO with multi-byte characters by veeceey · Pull Request #7201 · psf/requests

veeceey · 2026-02-10T08:35:01Z

Summary

super_len() uses seek/tell to measure the length of file-like objects such as StringIO and BytesIO. However, StringIO.tell() returns the character position, not the byte offset. For strings containing multi-byte UTF-8 characters (e.g. emoji), this produces an incorrect Content-Length header that violates RFC 9110 section 8.6.

For example, io.StringIO("\U0001F4A9") (a single emoji) previously returned a length of 1 (character count) instead of 4 (UTF-8 byte count), causing the server to receive a Content-Length: 1 header while 4 bytes are actually sent.

This is the same class of bug that was fixed for plain str bodies in #6586 -- str is encoded to UTF-8 before measuring, but StringIO was not. This PR makes StringIO handling consistent with str by reading the remaining text, encoding it to UTF-8, and measuring the byte length.

Before

str       → Content-Length: 4  ✓
bytes     → Content-Length: 4  ✓
BytesIO   → Content-Length: 4  ✓
StringIO  → Content-Length: 1  ✗  (character count, not byte count)

After

str       → Content-Length: 4  ✓
bytes     → Content-Length: 4  ✓
BytesIO   → Content-Length: 4  ✓
StringIO  → Content-Length: 4  ✓

Changes

src/requests/utils.py: In super_len(), detect io.StringIO and read+encode the remaining text to compute the UTF-8 byte length instead of relying on tell().
tests/test_utils.py: Added test_super_len_stringio_multibyte covering single emoji, mixed content, partially-read StringIO, and position preservation.

Test plan

All existing TestSuperLen tests pass (ASCII StringIO, BytesIO, partially-read files, etc.)
New test verifies correct byte count for multi-byte characters
New test verifies correct byte count for partially-read StringIO
New test verifies file position is preserved after super_len() call

StringIO.tell() returns the character position, not the byte offset, so super_len() returned the wrong value for StringIO objects containing multi-byte UTF-8 characters (e.g. emoji). This caused an incorrect Content-Length header that violates RFC 9110 section 8.6. Read the remaining text and encode it to UTF-8 to measure the true byte length, consistent with how plain str bodies are already handled. Closes psf#6917 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

veeceey and others added 2 commits February 10, 2026 00:34

Merge branch 'main' into fix/stringio-content-length-warning

20d9eef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Fix incorrect Content-Length for StringIO with multi-byte characters#7201

Fix incorrect Content-Length for StringIO with multi-byte characters#7201
veeceey wants to merge 2 commits intopsf:mainfrom
veeceey:fix/stringio-content-length-warning

veeceey commented Feb 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

veeceey commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Before

After

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

veeceey commented Feb 10, 2026 •

edited

Loading