Skip to content

gh-135552: Reset type cache after finalization to prevent possible segfault while garbage collecting #135728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sergey-miryanov
Copy link
Contributor

@sergey-miryanov sergey-miryanov commented Jun 19, 2025

While trying to find the root cause of the linked issue, I observed that the type cache was holding an obsolete reference. Because the type cache holds borrowed references, this leads to a use-after-free error.

I'm resetting the cache in this PR. But I suspect that this may affect overall performance.

@sergey-miryanov
Copy link
Contributor Author

@ZeroIntensity @sobolevn @Eclips4 @efimov-mikhail @fxeqxmulfx Please take a look.

Copy link
Member

@sobolevn sobolevn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a test case and a NEWS entry.
I think that there's a minimal repro available in the issue.

The solution seems reasonable!

@Zheaoli
Copy link
Contributor

Zheaoli commented Jun 20, 2025

Would you mind adding a benchmark for this commit by comment?

@sergey-miryanov
Copy link
Contributor Author

@sobolevn Thanks for the review! I will do soon. However, as with all simple solutions, this one doesn't seem to be the right one. I'm digging further.

@Zheaoli Yes, I'm planning to do perf benchmarks.

@ZeroIntensity
Copy link
Member

However, as with all simple solutions, this one doesn't seem to be the right one. I'm digging further.

Yeah, as I said, this seems to be a problem with the GIL-ful garbage collector, not the object model. Something like this would also cause problems on the free-threaded build, but FT is working perfectly.

@sergey-miryanov
Copy link
Contributor Author

@ZeroIntensity The root cause of this error is the absence of subclasses for BaseNode class. Therefore, the STORE_ATTR (->PyObject_SetAttr->tp_setattro->type_update_dict->type_modified_unlocked) can't reset the cache for subclasses. As a result, the Node type cache contains obsolete entries.

I have the following code, and I believe the problem is common to both:

class XXX:
    def __init__(self):
        self.x = weakref.ref(self)
        self.z = self

    def __del__(self):
        assert self.x() == self, (self.x(), self.z, self)

XXX()
XXX()

➜ .\python.bat -X faulthandler x.py 
Exception ignored while calling deallocator <function XXX.__del__ at 0x000001E0085F5010>:
Traceback (most recent call last):
  File "D:\Sources\_pythonish\cpython\main\x.py", line 29, in __del__
    assert self.x() == self, (self.x(), self.z, self)
AssertionError: (None, <__main__.XXX object at 0x000001E00838B760>, <__main__.XXX object at 0x000001E00838B760>)
Exception ignored while calling deallocator <function XXX.__del__ at 0x000001E0085F5010>:
Traceback (most recent call last):
  File "D:\Sources\_pythonish\cpython\main\x.py", line 29, in __del__
    assert self.x() == self, (self.x(), self.z, self)
AssertionError: (None, <__main__.XXX object at 0x000001E00838B8C0>, <__main__.XXX object at 0x000001E00838B8C0>)

As you can see, the weakref is None, but self has not been destroyed yet. I believe the garbage collector has a mechanism to handle weakrefs, but it runs too early, before _Py_Finalize (where tp_finalize is called).

@ZeroIntensity
Copy link
Member

The root cause of this error is the absence of subclasses for BaseNode class. Therefore, the STORE_ATTR (->PyObject_SetAttr->tp_setattro->type_update_dict->type_modified_unlocked) can't reset the cache for subclasses. As a result, the Node type cache contains obsolete entries.

Ah, and there's no type cache under free-threading. Interesting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants