-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-130567: fix strxfrm memory allocation #130619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
75f83f4
to
94ba382
Compare
cc @sobolevn |
Modules/_localemodule.c
Outdated
/* more space needed, some implementations return needed size while | ||
others just buffer length and it's up to the caller to figure | ||
out needed buffer size */ | ||
wchar_t * new_buf = PyMem_Realloc(buf, n2 * 2 * sizeof(wchar_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of allocating more memory if you still pass n2+1
to wcsxfrm() below? I don't understand how this fix works :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I am still on 14.5, so I cannot check :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix comes from: #130567 (comment)
They figured out that wcsxfrm does not actually return desired string size in case of the failure, so they ended up allocating double space in that case, see SWI-Prolog/swipl-devel@f0bbfb0.
But, it looks like that "double the memory is good enough".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of allocating more memory if you still pass
n2+1
to wcsxfrm() below? I don't understand how this fix works :-)
It apparently doesn't. Fixing that now. I was hoping for macos-15 being in CI because I don't have access to it either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, it looks like that "double the memory is good enough".
Well, it seems to cover most of the cases. They are doing the doubling the size in the loop until it succeeds. That seemed too aggressive to me.
Modules/_localemodule.c
Outdated
/* more space needed, some implementations return needed size while | ||
others just buffer length and it's up to the caller to figure | ||
out needed buffer size */ | ||
size_t new_buf_len = n2 * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is working well for many years on many platforms. If only macOS 15 requires a change, would it be possible to make this code conditional on macOS? Something like:
#ifdef __APPLE__
// More space needed. macOS 15 just returns the buffer length and it's
// up to the caller to figure out needed buffer size (gh-130567).
size_t new_buf_len = n2 * 2;
#else
size_t new_buf_len = n2 + 1;
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might work, the thing is that this behavior is not defined in the specification, so it could behave this way elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If more platforms are impacted by this issue tomorrow, we can revisit the implementation later. For now, I would prefer to make the change specific to macOS (15).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've changed the code to use the return value in case it larger than the buffer and fall back to doubling if it is not.
a6c5843
to
e3c5009
Compare
The posix specification does not define that wcsxfrm should return needed buffer size, it just says: If the value returned is n or more, the contents of the array pointed to by ws1 are unspecified. Therefore double the allocation when the original call has failed and repeat that until it works.
} else { | ||
// Some platforms, such as macOS 15 doesn't return desired buffer | ||
// size so it is up to the caller to figure out needed buffer size | ||
// (gh-130567). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should check for integer overflow, something like:
if (buf_len > PY_SSIZE_T_MAX / 2) {
PyErr_NoMemory();
break;
}
if (n2 >= (size_t)n1) { | ||
/* more space needed */ | ||
wchar_t * new_buf = PyMem_Realloc(buf, (n2+1)*sizeof(wchar_t)); | ||
for (;;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not comfortable with unbound loops :-( I prefer the current code which calls wcsxfrm() once or twice, but no more.
Closing this as it is probably not going to address the issues I tried to address with it, see #130567 (comment) and later comments. |
The posix specification does not define that wcsxfrm should return needed buffer size, it just says:
If the value returned is n or more, the contents of the array pointed to by ws1 are unspecified.
Therefore double the allocation when the original call has failed. It might also make sense to increase the allocation repeatedly, but that would make the code more complex.