Skip to content

gh-143050: add helper _PyLong_InitTag()#147956

Merged
vstinner merged 9 commits intopython:mainfrom
skirpichev:add-_PyLong_Init
Apr 1, 2026
Merged

gh-143050: add helper _PyLong_InitTag()#147956
vstinner merged 9 commits intopython:mainfrom
skirpichev:add-_PyLong_Init

Conversation

@skirpichev
Copy link
Copy Markdown
Member

@skirpichev skirpichev commented Apr 1, 2026

With this we can assume, that _PyLong_Set*() operate on non-immortal integers.

With this we can assume, that _PyLong_Set*() operate on
non-immortal integers.
@skirpichev skirpichev changed the title gh-143050: add helper _PyLong_Init() gh-143050: add helper _PyLong_InitTag() Apr 1, 2026
_PyLong_SetSignAndDigitCount(result, size != 0, size);
/* The digit has to be initialized explicitly to avoid
* use-of-uninitialized-value. */
result->long_value.ob_digit[0] = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm not convinced that this initialization is useful, the comment says that it is. We should keep it, no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested to keep this initialization. You marked my comment as resolved, I don't understand. Do you consider that it's not needed?

An alternative would be to fill digits with a pattern to detect usage of non-initialized memory when Python is built in debug mode:

#ifdef Py_DEBUG
    // Fill digits with a known pattern to detect usage of uninitialized memory
    memset(result->long_value.ob_digit, 0xff,
           sizeof(result->long_value.ob_digit[0]) * ndigits);
#endif

We use this strategy for:

  • str: see unicode_fill_invalid() function.
  • PyBytesWriter: memset(byteswriter_data(writer), 0xff, byteswriter_allocated(writer)) in PyBytesWriter_Create()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest to keep this just here? BTW, tests pass - I rerun one job again.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ob_digit[0] = 0; was added by commit fc130c4 of issue gh-102509 to fix the OSS Fuzz report https://issues.oss-fuzz.com/issues/42516166.

Looks like it's complaining about a use of unit values in _PyLong_FromByteArray
Allocation happens here as part of _PyLong_New

v = _PyLong_New(ndigits);

and the claimed use is at
return (PyObject *)maybe_small_long(long_normalize(v));

inside of maybe_small_long and IS_SMALL_INT

I understand that the maybe_small_long() at the end of _PyLong_FromByteArray() triggered the issue when Python was built with --with-memory-sanitizer (-fsanitize=memory). From what I understood, the sign was initialized (to zero) but not ob_digit[0].


Oh interesting, k_mul() already does what I proposed above:

#ifdef Py_DEBUG
    /* Fill with trash, to catch reference to uninitialized digits. */
    memset(ret->long_value.ob_digit, 0xDF, _PyLong_DigitCount(ret) * sizeof(digit));
#endif

k_lopsided_mul() sets explicitly all digits to zero:

    memset(ret->long_value.ob_digit, 0, _PyLong_DigitCount(ret) * sizeof(digit));

@vstinner
Copy link
Copy Markdown
Member

vstinner commented Apr 1, 2026

"Tests / Android (x86_64)" failed with: Error on ZipFile unknown archive. It's unrelated to this change.

  > Task :app:maxVersionSetup
  "Install Android Emulator v.36.4.10" ready.
  Installing Android Emulator in /usr/local/lib/android/sdk/emulator
  "Install Android Emulator v.36.4.10" complete.
  "Install Android Emulator v.36.4.10" finished.
  Checking the license for package AOSP ATD Intel x86_64 Atom System Image in /usr/local/lib/android/sdk/licenses
  License for package AOSP ATD Intel x86_64 Atom System Image accepted.
  Preparing "Install AOSP ATD Intel x86_64 Atom System Image API 32 (revision 1)".
  Warning: An error occurred while preparing SDK package AOSP ATD Intel x86_64 Atom System Image: Error on ZipFile unknown archive.:
  java.io.IOException: Error on ZipFile unknown archive
  	at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:383)
  	at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:323)
  	at org.apache.commons.compress.archivers.zip.ZipFile.<init>(ZipFile.java:279)
  	at com.android.repository.util.InstallerUtil.unzip(InstallerUtil.java:101)
  	at com.android.repository.impl.installer.BasicInstaller.doPrepare(BasicInstaller.java:91)

@skirpichev
Copy link
Copy Markdown
Member Author

skirpichev commented Apr 1, 2026 via email

@vstinner
Copy link
Copy Markdown
Member

vstinner commented Apr 1, 2026

"Tests / CIFuzz / cpython3 (memory)" CI logs an use-of-uninitialized-value:

==45==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55af57984f54 in _PyLong_IsSmallInt /src/cpython3/Include/internal/pycore_long.h:243:5
    #1 0x55af57984f54 in _PyLong_SetSignAndDigitCount /src/cpython3/Include/internal/pycore_long.h:305:5
    #2 0x55af579844c6 in long_alloc /src/cpython3/Objects/longobject.c:190:5
    #3 0x55af5798ef9a in PyLong_FromUnsignedLong /src/cpython3/Objects/longobject.c:442:5
    #4 0x55af5798ef9a in PyLong_FromVoidPtr /src/cpython3/Objects/longobject.c:1505:12

pycore_long.h:243 is the following assertion in _PyLong_IsSmallInt():

    assert((_PyLong_IsCompact(op)
            && _PY_IS_SMALL_INT(_PyLong_CompactValue(op)))
           || (!is_small_int));

Oh right, op->long_value.ob_digit[0] = 0; is needed before calling _PyLong_IsSmallInt() to have initialized memory (_PyLong_CompactValue()).

Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of changing _PyLong_IsSmallInt() implementation like that?

static inline bool
_PyLong_IsSmallInt(const PyLongObject *op)
{
    assert(PyLong_Check(op));
    bool is_small_int = (op->long_value.lv_tag & IMMORTALITY_BIT_MASK) != 0;
    if (is_small_int) {
        assert(PyLong_CheckExact(op));
        assert(_Py_IsImmortal(op));
        assert((_PyLong_IsCompact(op)
                && _PY_IS_SMALL_INT(_PyLong_CompactValue(op))));
    }
    return is_small_int;
}

The current code runs many checks when is_small_int is false which are not needed.

@skirpichev
Copy link
Copy Markdown
Member Author

What do you think of changing _PyLong_IsSmallInt() implementation like that?

Lets try, but I suspect it might fall in other place.

_PyLong_SetSignAndDigitCount(result, size != 0, size);
/* The digit has to be initialized explicitly to avoid
* use-of-uninitialized-value. */
result->long_value.ob_digit[0] = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested to keep this initialization. You marked my comment as resolved, I don't understand. Do you consider that it's not needed?

An alternative would be to fill digits with a pattern to detect usage of non-initialized memory when Python is built in debug mode:

#ifdef Py_DEBUG
    // Fill digits with a known pattern to detect usage of uninitialized memory
    memset(result->long_value.ob_digit, 0xff,
           sizeof(result->long_value.ob_digit[0]) * ndigits);
#endif

We use this strategy for:

  • str: see unicode_fill_invalid() function.
  • PyBytesWriter: memset(byteswriter_data(writer), 0xff, byteswriter_allocated(writer)) in PyBytesWriter_Create()

@vstinner
Copy link
Copy Markdown
Member

vstinner commented Apr 1, 2026

Ok, I managed to reproduce the MSAN (-fsanitize=memory) error on this PR.

Apply the patch:

diff --git a/Objects/longobject.c b/Objects/longobject.c
index b74a435b541..d46dbf0bd82 100644
--- a/Objects/longobject.c
+++ b/Objects/longobject.c
@@ -170,7 +170,7 @@ long_alloc(Py_ssize_t size)
     Py_ssize_t ndigits = size ? size : 1;
 
     if (ndigits == 1) {
-        result = (PyLongObject *)_Py_FREELIST_POP(PyLongObject, ints);
+        //result = (PyLongObject *)_Py_FREELIST_POP(PyLongObject, ints);
     }
     if (result == NULL) {
         /* Number of bytes needed is: offsetof(PyLongObject, ob_digit) +

Build Python with:

./configure --with-memory-sanitizer --with-pydebug --without-pymalloc CC=clang LD=clang CFLAGS="-O0"
make clean
make

Then run:

$ PYTHONMALLOC=malloc ./python -c "import _testcapi; _testcapi.pylong_fromnativebytes(b'\x00', 1, 0, 0)"
==1285845==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x00000117d5be in _PyLong_CompactValue /home/vstinner/python/main/./Include/cpython/longintrepr.h:136:5
    #1 0x00000118fb5f in maybe_small_long /home/vstinner/python/main/Objects/longobject.c:71:27
    #2 0x00000118f850 in _PyLong_FromByteArray /home/vstinner/python/main/Objects/longobject.c:1099:24
    (...)

  Uninitialized value was created by a heap allocation
    #0 0x000000440526 in malloc (/home/vstinner/python/main/python+0x440526) (BuildId: 8ebc9d3744df4c174e3833d7dd40285db4f23ce2)
    #1 0x0000014d53fb in _PyMem_RawMalloc /home/vstinner/python/main/Objects/obmalloc.c:65:12
    #2 0x0000014e50d9 in PyObject_Malloc /home/vstinner/python/main/Objects/obmalloc.c:1649:12
    #3 0x00000117b368 in long_alloc /home/vstinner/python/main/Objects/longobject.c:181:18
    #4 0x00000118dfdc in _PyLong_FromByteArray /home/vstinner/python/main/Objects/longobject.c:1049:9
    (...)

SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/vstinner/python/main/./Include/cpython/longintrepr.h:136:5 in _PyLong_CompactValue
Exiting

@vstinner
Copy link
Copy Markdown
Member

vstinner commented Apr 1, 2026

I created issue gh-147988 and PR gh-147989 to fill digits with a pattern in long_alloc() when Python is built in debug mode.

@skirpichev skirpichev requested a review from vstinner April 1, 2026 20:57
Copy link
Copy Markdown
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinner vstinner enabled auto-merge (squash) April 1, 2026 21:11
@vstinner vstinner merged commit b456cb2 into python:main Apr 1, 2026
55 checks passed
@skirpichev skirpichev deleted the add-_PyLong_Init branch April 1, 2026 22:19
@vstinner
Copy link
Copy Markdown
Member

vstinner commented Apr 1, 2026

Add helper _PyLong_InitTag()
With this we can assume, that _PyLong_SetSignAndDigitCount() and _PyLong_SetDigitCount() operate on non-immortal integers.

It's a good thing that these two functions now fail in debug mode if they are called on a small integer. It will catch bugs earlier. So I merged the change. Thanks for this enhancement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants