Skip to content

gh-130567: fix strxfrm memory allocation #130619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

nijel
Copy link
Contributor

@nijel nijel commented Feb 27, 2025

The posix specification does not define that wcsxfrm should return needed buffer size, it just says:

If the value returned is n or more, the contents of the array pointed to by ws1 are unspecified.

Therefore double the allocation when the original call has failed. It might also make sense to increase the allocation repeatedly, but that would make the code more complex.

@vstinner
Copy link
Member

cc @sobolevn

/* more space needed, some implementations return needed size while
others just buffer length and it's up to the caller to figure
out needed buffer size */
wchar_t * new_buf = PyMem_Realloc(buf, n2 * 2 * sizeof(wchar_t));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of allocating more memory if you still pass n2+1 to wcsxfrm() below? I don't understand how this fix works :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am still on 14.5, so I cannot check :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix comes from: #130567 (comment)

They figured out that wcsxfrm does not actually return desired string size in case of the failure, so they ended up allocating double space in that case, see SWI-Prolog/swipl-devel@f0bbfb0.

But, it looks like that "double the memory is good enough".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of allocating more memory if you still pass n2+1 to wcsxfrm() below? I don't understand how this fix works :-)

It apparently doesn't. Fixing that now. I was hoping for macos-15 being in CI because I don't have access to it either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, it looks like that "double the memory is good enough".

Well, it seems to cover most of the cases. They are doing the doubling the size in the loop until it succeeds. That seemed too aggressive to me.

/* more space needed, some implementations return needed size while
others just buffer length and it's up to the caller to figure
out needed buffer size */
size_t new_buf_len = n2 * 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is working well for many years on many platforms. If only macOS 15 requires a change, would it be possible to make this code conditional on macOS? Something like:

#ifdef __APPLE__
        // More space needed. macOS 15 just returns the buffer length and it's
        // up to the caller to figure out needed buffer size (gh-130567).
        size_t new_buf_len = n2 * 2;
#else
        size_t new_buf_len = n2 + 1;
#endif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might work, the thing is that this behavior is not defined in the specification, so it could behave this way elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If more platforms are impacted by this issue tomorrow, we can revisit the implementation later. For now, I would prefer to make the change specific to macOS (15).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've changed the code to use the return value in case it larger than the buffer and fall back to doubling if it is not.

@nijel nijel force-pushed the strxfrm-macos branch 2 times, most recently from a6c5843 to e3c5009 Compare February 28, 2025 13:04
The posix specification does not define that wcsxfrm should return
needed buffer size, it just says:

If the value returned is n or more, the contents of the array pointed to
by ws1 are unspecified.

Therefore double the allocation when the original call has failed and
repeat that until it works.
} else {
// Some platforms, such as macOS 15 doesn't return desired buffer
// size so it is up to the caller to figure out needed buffer size
// (gh-130567).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check for integer overflow, something like:

            if (buf_len > PY_SSIZE_T_MAX / 2) {
                PyErr_NoMemory();
                break;
            }

if (n2 >= (size_t)n1) {
/* more space needed */
wchar_t * new_buf = PyMem_Realloc(buf, (n2+1)*sizeof(wchar_t));
for (;;) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not comfortable with unbound loops :-( I prefer the current code which calls wcsxfrm() once or twice, but no more.

@nijel
Copy link
Contributor Author

nijel commented Feb 28, 2025

Closing this as it is probably not going to address the issues I tried to address with it, see #130567 (comment) and later comments.

@nijel nijel closed this Feb 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants