Description
Bug report
The faulthandler
module can dump Python tracebacks when a crash occurs. Unfortunately, the current implementation itself crashes in the free-threaded build. This is mostly undetected because our tests expect a crash, but faulthandler itself crashing is not desirable.
Faulthandler may be called without a valid thread state (i.e., without holding GIL)
Faulthandler may be triggered when the thread doesn't have a valid thread state (i.e., doesn't hold the GIL in the default build and is not "attached" in the free-threaded build). Additionally, it's called from a signal handler, so we only want to call async-signal-safe functions (generally no locking).
Faulthandler calls PyDict_Next
(via _Py_DumpExtensionModules
) on the modules dictionary. This is not entirely safe in the default build (because we don't hold the GIL), but works well enough in practice.
However, it will consistently crash in the free-threaded build because PyDict_Next
starts a critical section, which assumes there is a valid thread state.
Suggestion:
- we should use
_PyDict_Next()
, which doesn't internally lock the dict - we should try to lock the dict around the
_PyDict_Next()
loop, with_PyMutex_LockTimed
timeout=0
. If we can't immediately lock the dict, we should not dump modules. This async-signal-safe because it's just a simple compare-exchange and doesn't block. - we can't call
PyMutex_Unlock()
because it's not async-signal-safe (it internally acquires locks in order to wake up threads), so we should either use a simple atomic exchange to unlock the dict (without waking up waiters) or not bother unlocking the lock at all. We exit shortly after_Py_DumpExtensionModules
, so it doesn't matter if we don't wake up other threads.