-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(abci): ignore comet ctx and use app ctx #2568
Conversation
* Add special case for startup sync newPayload call * Aggregate accepted and syncing in newPayload, since they were treated the same anyways * Add TODO to modify special case for all FinalizeBlock calls
Opted to entirely remove the This original reason was not a very good reason - IF they implement the Removing this functionality entirely streamlines the whole thing, and it was unnecessary in the first place. Now, if comet ever implements |
@@ -54,6 +54,9 @@ func (s *Service) finalizeBlockInternal( | |||
// CometBFT. | |||
if s.finalizeBlockState == nil { | |||
s.finalizeBlockState = s.resetState(ctx) | |||
} else { | |||
// Preserve the CosmosSDK context while using the correct base ctx. | |||
s.finalizeBlockState.SetContext(s.finalizeBlockState.Context().WithContext(ctx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bit unsure on this. Why was introducing this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the cosmos SDK context, which I will refer to as cosmosCtx
, contains a reference to the baseCtx
, which is the regular golang context.Context
. This cosmosCtx
also stores a bunch of data in it for our state and commit multi store and whatnot, but we don't want to affect any of that.
When we resetState(ctx)
, we are creating a new cosmosCtx
, where baseCtx = ctx
.
If we do not resetState(ctx)
, then the already existing finalizeBlockState
is used. This state has the old cosmosCtx
, which has the old ctx
from whichever other function is was set in, likely the previous ProcessProposal
. We don't mind using the old cosmosCtx
, but we want to make sure we are using the new ctx
that is passed in for this function. So we can simply set baseCtx = ctx
here.
Indeed, this is not as big of a problem now, since all of the contexts are the same and the context isn't getting canceled inbetween each ABCI++ call like it was in a previous iteration of this PR. However, purely for correctness, we want to make sure our s.Blockchain.FinalizeBlock(s.finalizeBlockState.Context())
is using the ctx
passed in via finalizeBlockInteral(ctx, ...)
and NOT the ctx
from some other function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it is still not accurately using the correct ctx unless we make this change.
I had assumed it was always going to be using this context.
For my understanding, is it fair to say that we don't need to change the context here given we have this universal base context which currently never changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rezbera if you check the call hierarchy you will see that FinalizeBlock
does use s.ctx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually @rezbera we can solve the issue by just not passing ctx as parameter, but reusing s.ctx everywhere, which this PR basically does. So no need to look up the call hierarchy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 non-blocking query , but rest lgtm
@@ -54,6 +54,9 @@ func (s *Service) finalizeBlockInternal( | |||
// CometBFT. | |||
if s.finalizeBlockState == nil { | |||
s.finalizeBlockState = s.resetState(ctx) | |||
} else { | |||
// Preserve the CosmosSDK context while using the correct base ctx. | |||
s.finalizeBlockState.SetContext(s.finalizeBlockState.Context().WithContext(ctx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make a helper function on state
to reset/update the base ctx
@@ -95,7 +95,7 @@ func (s *SimulatedSuite) SetupTest() { | |||
}() | |||
|
|||
// Allow a short period for services to fully initialize. | |||
time.Sleep(2 * time.Second) | |||
time.Sleep(5 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noting here that this is a hacky way to make sure that goroutine running s.TestNode.Start(s.Ctx)
above completes before we move on (otherwise consensus service may crash due to ctx being nil).
We will have to rework this to use a different way to check stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed to use a reliable method in #2572
It's not ideal, because ideally we'd have a readiness probe API but this is better than sleeping arbitrary time.
Currently, beacond hangs on SIGTERM if its inside a retry timeout. This is because the context used in the retry is not cancelled in our signal hander.
To address this, this PR makes the context available to the
cometbft
service which it gets from the context passed in toStart()
. Now, when signal handler catches a signal, the context cancellation will be checked by the ABCI++ endpoints properly. This causes us to properly handle signal cancellation in the retry timeout.Unfortunately, CometBFT does not yet support context cancellation, or context usage at all. It currently passes in
context.TODO()
into all of the ABCI methods, so the ctx we see in the ABCI function implementations is not connected to the ctx from our node. I also can't seem to find anything in cometBFT RFC or docs that indicate supporting this.Since CometBFT does not support context cancellation, this means that there is no clean way out -- we MUST return error in
ProcessProposal
andFinalizeBlock
(which causes theCONSENSUS FAILURE!!!
panic). If we don't then our node will be attempting to participate in consensus with potentially corrupt, invalid, or missing data. I have tested this behavior by not returning an error fromFinalizeBlock
during context cancellation, and indeed my node still proceeded to callCommit
.