Skip to content

gh-144759: Fix undefined behavior from NULL pointer arithmetic in lexer#144788

Merged
pablogsal merged 3 commits intopython:mainfrom
raminfp:fix-lexer-ub-null-pointer-arithmetic
Feb 15, 2026
Merged

gh-144759: Fix undefined behavior from NULL pointer arithmetic in lexer#144788
pablogsal merged 3 commits intopython:mainfrom
raminfp:fix-lexer-ub-null-pointer-arithmetic

Conversation

@raminfp
Copy link
Contributor

@raminfp raminfp commented Feb 13, 2026

Fix undefined behavior in _PyLexer_remember_fstring_buffers and _PyLexer_restore_fstring_buffers caused by performing pointer arithmetic on NULL pointers (NULL - tok->buf).

When tok_mode_stack[0] is initialized, the start and multi_line_start fields are not explicitly set and remain NULL (from PyMem_Calloc). Later, when the lexer buffer is reallocated, the remember/restore functions perform NULL - valid_pointer and valid_pointer + negative_offset, both of which are undefined behavior in C.

The fix adds explicit NULL checks: store -1 as a sentinel offset when the pointer is NULL, and restore NULL when the offset is negative.

Detected with --with-undefined-behavior-sanitizer:

Parser/lexer/buffer.c:30:32: runtime error: pointer index expression with base 0x50300007f130 overflowed to 0xfffffddfffc38020
Parser/lexer/buffer.c:31:43: runtime error: pointer index expression with base 0x50300007f130 overflowed to 0xfffffddfffc38020

Fixes #144759

@python-cla-bot
Copy link

python-cla-bot bot commented Feb 13, 2026

All commit authors signed the Contributor License Agreement.

CLA signed

…in lexer

Guard against NULL pointer arithmetic in `_PyLexer_remember_fstring_buffers`
and `_PyLexer_restore_fstring_buffers`. When `start` or `multi_line_start`
are NULL (uninitialized in tok_mode_stack[0]), performing `NULL - tok->buf`
is undefined behavior. Add explicit NULL checks to store -1 as sentinel
and restore NULL accordingly.
@raminfp raminfp force-pushed the fix-lexer-ub-null-pointer-arithmetic branch from 588d391 to 0b18bc0 Compare February 13, 2026 16:10
…tions

Replace :c:func: references with double-backtick markup since these
are internal functions without documentation entries.
@eendebakpt
Copy link
Contributor

@raminfp Could you add a regression test? I suspect Lib/test/test_repl.py is the right location, but I am not sure.

And please avoid force pushes to the PR so we preserve history.

…exer

Add test_lexer_buffer_realloc_with_null_start to test_repl.py that
exercises the code path where the lexer buffer is reallocated while
tok_mode_stack[0] has NULL start/multi_line_start pointers. This
triggers _PyLexer_remember_fstring_buffers and verifies the NULL
checks prevent undefined behavior.
@pablogsal
Copy link
Member

Great catch!

@pablogsal pablogsal added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Feb 15, 2026
@pablogsal pablogsal merged commit e6110ef into python:main Feb 15, 2026
55 checks passed
@miss-islington-app
Copy link

Thanks @raminfp for the PR, and @pablogsal for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 15, 2026
…in lexer (pythonGH-144788)

Guard against NULL pointer arithmetic in `_PyLexer_remember_fstring_buffers`
and `_PyLexer_restore_fstring_buffers`. When `start` or `multi_line_start`
are NULL (uninitialized in tok_mode_stack[0]), performing `NULL - tok->buf`
is undefined behavior. Add explicit NULL checks to store -1 as sentinel
and restore NULL accordingly.

Add test_lexer_buffer_realloc_with_null_start to test_repl.py that
exercises the code path where the lexer buffer is reallocated while
tok_mode_stack[0] has NULL start/multi_line_start pointers. This
triggers _PyLexer_remember_fstring_buffers and verifies the NULL
checks prevent undefined behavior.
(cherry picked from commit e6110ef)

Co-authored-by: Ramin Farajpour Cami <ramin.blackhat@gmail.com>
@miss-islington-app
Copy link

Sorry, @raminfp and @pablogsal, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker e6110efd03259acd1895cff63fbfa115ac5f16dc 3.13

@bedevere-app
Copy link

bedevere-app bot commented Feb 15, 2026

GH-144834 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Feb 15, 2026
@pablogsal
Copy link
Member

@raminfp can you follow these instructions to create the 3.13 back port #144788 (comment)

thanks!

pablogsal pushed a commit that referenced this pull request Feb 15, 2026
… in lexer (GH-144788) (#144834)

gh-144759: Fix undefined behavior from NULL pointer arithmetic in lexer (GH-144788)

Guard against NULL pointer arithmetic in `_PyLexer_remember_fstring_buffers`
and `_PyLexer_restore_fstring_buffers`. When `start` or `multi_line_start`
are NULL (uninitialized in tok_mode_stack[0]), performing `NULL - tok->buf`
is undefined behavior. Add explicit NULL checks to store -1 as sentinel
and restore NULL accordingly.

Add test_lexer_buffer_realloc_with_null_start to test_repl.py that
exercises the code path where the lexer buffer is reallocated while
tok_mode_stack[0] has NULL start/multi_line_start pointers. This
triggers _PyLexer_remember_fstring_buffers and verifies the NULL
checks prevent undefined behavior.
(cherry picked from commit e6110ef)

Co-authored-by: Ramin Farajpour Cami <ramin.blackhat@gmail.com>
@raminfp raminfp deleted the fix-lexer-ub-null-pointer-arithmetic branch February 15, 2026 15:25
@python python deleted a comment from bedevere-bot Feb 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs backport to 3.13 bugs and security fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Uninitialized start and multi_line_start Causing Undefined Behavior - Pointer overflow

3 participants