-
-
Notifications
You must be signed in to change notification settings - Fork 280
don't swallow panics from spawned threads #1763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
workingjubilee
merged 2 commits into
pgcentralfoundation:develop
from
jyn514:spawned-thread-panics
Jul 8, 2024
Merged
don't swallow panics from spawned threads #1763
workingjubilee
merged 2 commits into
pgcentralfoundation:develop
from
jyn514:spawned-thread-panics
Jul 8, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pgrx has somewhat complex panic handling. it looks something like this: 1. when a thread panics, the panic hook captures a backtrace and saves it in a thread-local for later. 2. the thread unwinds until it hits an FFI boundary (usually `run_guarded`). that downcasts the panic, takes the backtrace out of the thread-local, and hooks into postgres' `longjmp` mechanism 3. i forget what happens after this, i think it resumes unwinding once it's past the FFI barrier there is a slight problem here: we are using a thread-local to store the backtrace. if the panic does not happen on the main thread (for example, because a spawned thread tries to call into postgres and hits the check in `check_active_thread`), the backtrace will be lost. worse, if the main thread then unwinds in response to the panic, pgrx will use *its* backtrace instead of that of the worker thread. there are two main approaches we considered to fixing this: 1. fix the backtrace not to use a thread-local, so we can attach panics in spawned threads to a pgrx connection the way we would for the main thread. 2. stop handling panics in spawned threads altogether (and use the default hook). the downside of approach 1 is that there may not *be* a pgrx connection to attach to. the connection may have already closed, or the active connection may not be related to the thread that panicked, or we may be shutting down and will never check for the panic. in those cases the panic information will be missing or wrong. the downside of approach 2 is that it does not integrate with postgres' error handling mechanism, and in particular is not reported to psql. however, it does allow for developers using pgrx to handle the panic themselves, for example by handling the result from `JoinHandle::join`, in which case it *will* be reported to psql. this takes approach 2. we may want to reconsider this in the future, or perhaps add a helper library so that it's easy for applications to pass the panic into the main thread. --- note that the default panic handler in the standard library behaves quite poorly when multiple threads panic at once (it's sound, but the output is very hard to read). this being fixed in a separate PR upstream; see rust-lang/rust 127397.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
usamoi
pushed a commit
to tensorchord/pgrx
that referenced
this pull request
Mar 6, 2025
pgrx has somewhat complex panic handling. it looks something like this: 1. when a thread panics, the panic hook captures a backtrace and saves it in a thread-local for later. 2. the thread unwinds until it hits an FFI boundary (usually `run_guarded`). that downcasts the panic, takes the backtrace out of the thread-local, and hooks into postgres' `longjmp` mechanism 3. i forget what happens after this, i think it resumes unwinding once it's past the FFI barrier there is a slight problem here: we are using a thread-local to store the backtrace. if the panic does not happen on the main thread (for example, because a spawned thread tries to call into postgres and hits the check in `check_active_thread`), the backtrace will be lost. worse, if the main thread then unwinds in response to the panic, pgrx will use *its* backtrace instead of that of the worker thread. there are two main approaches we considered to fixing this: 1. fix the backtrace not to use a thread-local, so we can attach panics in spawned threads to a pgrx connection the way we would for the main thread. 2. stop handling panics in spawned threads altogether (and use the default hook). the downside of approach 1 is that there may not *be* a pgrx connection to attach to. the connection may have already closed, or the active connection may not be related to the thread that panicked, or we may be shutting down and will never check for the panic. in those cases the panic information will be missing or wrong. the downside of approach 2 is that it does not integrate with postgres' error handling mechanism, and in particular is not reported to psql. however, it does allow for developers using pgrx to handle the panic themselves, for example by handling the result from `JoinHandle::join`, in which case it *will* be reported to psql. this takes approach 2. we may want to reconsider this in the future, or perhaps add a helper library so that it's easy for applications to pass the panic into the main thread. --- note that the default panic handler in the standard library behaves quite poorly when multiple threads panic at once (it's sound, but the output is very hard to read). this being fixed in a separate PR upstream; see rust-lang/rust#127397
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
pgrx has somewhat complex panic handling. it looks something like this:
run_guarded
). that downcasts the panic, takes the backtrace out of the thread-local, and hooks into postgres'longjmp
mechanismthere is a slight problem here: we are using a thread-local to store the backtrace. if the panic does not happen on the main thread (for example, because a spawned thread tries to call into postgres and hits the check in
check_active_thread
), the backtrace will be lost. worse, if the main thread then unwinds in response to the panic, pgrx will use its backtrace instead of that of the worker thread.there are two main approaches we considered to fixing this:
the downside of approach 1 is that there may not be a pgrx connection to attach to. the connection may have already closed, or the active connection may not be related to the thread that panicked, or we may be shutting down and will never check for the panic. in those cases the panic information will be missing or wrong.
the downside of approach 2 is that it does not integrate with postgres' error handling mechanism, and in particular is not reported to psql. however, it does allow for developers using pgrx to handle the panic themselves, for example by handling the result from
JoinHandle::join
, in which case it will be reported to psql.this takes approach 2. we may want to reconsider this in the future, or perhaps add a helper library so that it's easy for applications to pass the panic into the main thread.
note that the default panic handler in the standard library behaves quite poorly when multiple threads panic at once (it's sound, but the output is very hard to read). this being fixed in a separate PR upstream; see rust-lang/rust#127397.
i tested locally that this correctly shows the backtrace from the right thread now:
but i am not sure how to write an automated test. i don't mind adding one if there's already a similar test but i would prefer not to write a whole test suite from scratch.