Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Include/cpython/longintrepr.h
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,11 @@ _PyLong_CompactValue(const PyLongObject *op)
assert(PyType_HasFeature(op->ob_base.ob_type, Py_TPFLAGS_LONG_SUBCLASS));
assert(PyUnstable_Long_IsCompact(op));
sign = 1 - (op->long_value.lv_tag & _PyLong_SIGN_MASK);
if (sign == 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not it emit a warning or generate inefficient code in non-debug build?

You can write assert(size != 0 || ...).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that sane compiler should recognize here no-op. I've tested assembly code for a toy program with gcc -O2.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In release mode, assertions are removed and so the code becomes if (size == 0) {}. C compilers are good to remove such dead code. I looked at the assembly code of Python built in release mode, and I cannot see any instruction related to if (size == 0).

gcc -O3 doesn't emit a warning on such code.

// gh-147988: Make sure that the digit is zero.
// It helps detecting the usage of uninitialized digits.
assert(op->long_value.ob_digit[0] == 0);
}
return sign * (Py_ssize_t)op->long_value.ob_digit[0];
}

Expand Down
10 changes: 10 additions & 0 deletions Lib/test/test_capi/test_long.py
Original file line number Diff line number Diff line change
Expand Up @@ -803,6 +803,16 @@ def to_digits(num):
self.assertEqual(pylongwriter_create(negative, digits), num,
(negative, digits))

@unittest.skipUnless(support.Py_DEBUG, "need a debug build (Py_DEBUG)")
def test_longwriter_finish(self):
# Test PyLongWriter_Create(0, 3, &digits) with PyLongWriter_Finish()
# where the last digit is left uninitialized
pylongwriter_finish_bug = _testcapi.pylongwriter_finish_bug
with self.assertRaises(SystemError) as cm:
pylongwriter_finish_bug()
self.assertEqual(str(cm.exception),
'PyLongWriter_Finish: digit 2 is uninitialized')

def test_bug_143050(self):
with support.adjust_int_max_str_digits(0):
# Bug coming from using _pylong.int_from_string(), that
Expand Down
20 changes: 20 additions & 0 deletions Modules/_testcapi/long.c
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,25 @@ pylongwriter_create(PyObject *module, PyObject *args)
}


static PyObject *
pylongwriter_finish_bug(PyObject *module, PyObject *Py_UNUSED(args))
{
void *writer_digits;
PyLongWriter *writer = PyLongWriter_Create(0, 3, &writer_digits);
if (writer == NULL) {
return NULL;
}

assert(PyLong_GetNativeLayout()->digit_size == sizeof(digit));
digit *digits = writer_digits;
digits[0] = 1;
digits[1] = 1;
// Oops, digits[2] is left uninitialized on purpose
// to test PyLongWriter_Finish()
return PyLongWriter_Finish(writer);
}


static PyObject *
get_pylong_layout(PyObject *module, PyObject *Py_UNUSED(args))
{
Expand All @@ -271,6 +290,7 @@ static PyMethodDef test_methods[] = {
{"pylong_aspid", pylong_aspid, METH_O},
{"pylong_export", pylong_export, METH_O},
{"pylongwriter_create", pylongwriter_create, METH_VARARGS},
{"pylongwriter_finish_bug", pylongwriter_finish_bug, METH_NOARGS},
{"get_pylong_layout", get_pylong_layout, METH_NOARGS},
{"pylong_ispositive", pylong_ispositive, METH_O},
{"pylong_isnegative", pylong_isnegative, METH_O},
Expand Down
43 changes: 36 additions & 7 deletions Objects/longobject.c
Original file line number Diff line number Diff line change
Expand Up @@ -187,9 +187,11 @@ long_alloc(Py_ssize_t size)
_PyObject_Init((PyObject*)result, &PyLong_Type);
}
_PyLong_SetSignAndDigitCount(result, size != 0, size);
/* The digit has to be initialized explicitly to avoid
* use-of-uninitialized-value. */
result->long_value.ob_digit[0] = 0;
#ifdef Py_DEBUG
// gh-147988: Fill digits with an invalid pattern to catch usage
// of uninitialized digits.
memset(result->long_value.ob_digit, 0xFF, ndigits * sizeof(digit));
#endif
return result;
}

Expand Down Expand Up @@ -1094,6 +1096,7 @@ _PyLong_FromByteArray(const unsigned char* bytes, size_t n,
int sign = is_signed ? -1: 1;
if (idigit == 0) {
sign = 0;
v->long_value.ob_digit[0] = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed in non-debug build?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As described in the issue, MSAN generates a memory error without this line, since _PyLong_CompactValue() uses uninitialized memory (ob_digit[0]).

If you ignore MSAN (and other sanitizers), maybe_small_long() returns the zero singleton, and _PyLong_CompactValue() returns 0 because it computes sign * digit where sign is 0 and digit is a random number.

This PR moves v->long_value.ob_digit[0] = 0; from long_alloc() to _PyLong_FromByteArray(), and make it conditional: only write ob_digit[0] if (idigit == 0) is true. So in the common case, the assignment is no longer done.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, we could live without all this. But I think it's a good idea to set digit[0] for zero.

Historically, we set number of digits to zero for zero. I think it should be 1 to reflect actual memory allocation. Then zero representation will make sense in both "big int" format (sign + magnitude as 1-element array) and as a "small int" (sign*digit[0])

}
_PyLong_SetSignAndDigitCount(v, sign, idigit);
return (PyObject *)maybe_small_long(long_normalize(v));
Expand Down Expand Up @@ -2852,6 +2855,7 @@ long_from_non_binary_base(const char *start, const char *end, Py_ssize_t digits,
*res = NULL;
return 0;
}
z->long_value.ob_digit[0] = 0;
_PyLong_SetSignAndDigitCount(z, 0, 0);

/* `convwidth` consecutive input digits are treated as a single
Expand Down Expand Up @@ -3365,6 +3369,7 @@ x_divrem(PyLongObject *v1, PyLongObject *w1, PyLongObject **prem)
*prem = NULL;
return NULL;
}
a->long_value.ob_digit[0] = 0;
v0 = v->long_value.ob_digit;
w0 = w->long_value.ob_digit;
wm1 = w0[size_w-1];
Expand Down Expand Up @@ -4141,10 +4146,6 @@ k_mul(PyLongObject *a, PyLongObject *b)
/* 1. Allocate result space. */
ret = long_alloc(asize + bsize);
if (ret == NULL) goto fail;
#ifdef Py_DEBUG
/* Fill with trash, to catch reference to uninitialized digits. */
memset(ret->long_value.ob_digit, 0xDF, _PyLong_DigitCount(ret) * sizeof(digit));
#endif

/* 2. t1 <- ah*bh, and copy into high digits of result. */
if ((t1 = k_mul(ah, bh)) == NULL) goto fail;
Expand Down Expand Up @@ -5633,6 +5634,12 @@ long_bitwise(PyLongObject *a,
Py_UNREACHABLE();
}

if ((size_z + negz) == 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to size_z == 0 & !negz, right? Why write the condition in such form? Why add this if? Is it needed in non-debug build?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to size_z == 0 & !negz, right? Why write the condition in such form?

My intent is to add a special case for long_alloc(0) on long_alloc(size_z + negz) below.

Why add this if?

The alternative is to add z->long_value.ob_digit[0] = 0; after long_alloc() below.

Do you prefer adding z->long_value.ob_digit[0] = 0; ?

Is it needed in non-debug build?

See my reply on _PyLong_FromByteArray() change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it does not matter.

Py_XDECREF(new_a);
Py_XDECREF(new_b);
return get_small_int(0);
}

/* We allow an extra digit if z is negative, to make sure that
the final two's complement of z doesn't overflow. */
z = long_alloc(size_z + negz);
Expand Down Expand Up @@ -6951,6 +6958,28 @@ PyLongWriter_Finish(PyLongWriter *writer)
PyLongObject *obj = (PyLongObject *)writer;
assert(Py_REFCNT(obj) == 1);

#ifdef Py_DEBUG
// gh-147988: Detect uninitialized digits: long_alloc() fills digits with
// 0xFF byte pattern. It's posssible because PyLong_BASE is smaller than
// the maximum value of the C digit type (uint32_t or unsigned short):
// most significan bits are unused by the API.
Py_ssize_t ndigits = _PyLong_DigitCount(obj);
if (ndigits == 0) {
// Check ob_digit[0] digit for the number zero
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not we simply assert obj->long_value.ob_digit[0] == 0?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code to raise SystemError, so there is now more complex code below.

ndigits = 1;
}
for (Py_ssize_t i = 0; i < ndigits; i++) {
digit d = obj->long_value.ob_digit[i];
if (d & ~(digit)PyLong_MASK) {
Py_DECREF(obj);
PyErr_Format(PyExc_SystemError,
"PyLongWriter_Finish: digit %zd is uninitialized",
i);
return NULL;
}
}
#endif

// Normalize and get singleton if possible
obj = maybe_small_long(long_normalize(obj));

Expand Down
Loading