Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(engine api): [1/3] error on engine API errors #2485

Merged
merged 17 commits into from
Feb 10, 2025

Conversation

shotes
Copy link
Contributor

@shotes shotes commented Feb 6, 2025

  1. Enforces the synchronicity of the FinalizeBlock's engine_forkchoiceUpdated call.
  2. Always returns error on engine_newPayload and engine_forkchoiceUpdated failures. Therefore
  3. Remove optimistic engine from transaction.Context and the engine.

With both of these, we ensure that the CL will fail when it goes out of sync with the EL.

In future PRs, we should attempt retries on these engine API calls to allow for easier recovery.

@shotes shotes requested a review from a team as a code owner February 6, 2025 18:39
Copy link

codecov bot commented Feb 6, 2025

Codecov Report

Attention: Patch coverage is 2.17391% with 45 lines in your changes missing coverage. Please review.

Project coverage is 32.35%. Comparing base (412f08e) to head (fc3fb65).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
beacon/blockchain/execution_engine.go 0.00% 17 Missing ⚠️
beacon/blockchain/finalize_block.go 0.00% 7 Missing ⚠️
execution/engine/metrics.go 0.00% 6 Missing ⚠️
execution/engine/engine.go 0.00% 5 Missing ⚠️
payload/builder/payload.go 0.00% 5 Missing ⚠️
beacon/validator/block_builder.go 0.00% 2 Missing ⚠️
state-transition/core/state_processor_payload.go 0.00% 2 Missing ⚠️
beacon/blockchain/process_proposal.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2485      +/-   ##
==========================================
+ Coverage   32.24%   32.35%   +0.10%     
==========================================
  Files         350      350              
  Lines       15653    15595      -58     
  Branches       20       20              
==========================================
- Hits         5047     5045       -2     
+ Misses      10243    10187      -56     
  Partials      363      363              
Files with missing lines Coverage Δ
consensus-types/types/payload_requests.go 76.53% <ø> (-0.24%) ⬇️
primitives/transition/context.go 0.00% <ø> (ø)
testing/state-transition/state-transition.go 84.37% <100.00%> (-0.25%) ⬇️
beacon/blockchain/process_proposal.go 0.00% <0.00%> (ø)
beacon/validator/block_builder.go 0.00% <0.00%> (ø)
state-transition/core/state_processor_payload.go 26.92% <0.00%> (+0.25%) ⬆️
execution/engine/engine.go 0.00% <0.00%> (ø)
payload/builder/payload.go 20.12% <0.00%> (-0.40%) ⬇️
execution/engine/metrics.go 0.00% <0.00%> (ø)
beacon/blockchain/finalize_block.go 0.00% <0.00%> (ø)
... and 1 more

Comment on lines 67 to 70
s.logger.Error(
"failed to send forkchoice update without attributes",
"error", err,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced logs with more expressive errors

@shotes shotes changed the title fix(engine api): error on engine API errors fix(engine api): [1/2] error on engine API errors Feb 6, 2025
@shotes shotes changed the title fix(engine api): [1/2] error on engine API errors fix(engine api): [1/3] error on engine API errors Feb 6, 2025
@@ -96,13 +96,13 @@ func (ee *Engine) NotifyForkchoiceUpdate(
// We do not bubble the error up, since we want to handle it
// in the same way as the other cases.
ee.metrics.markForkchoiceUpdateSyncing(req.State, err)
return payloadID, nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we should return anything other than nil payload and latestValidaHash when it errs

@@ -181,8 +183,7 @@ func (s *Service) executeStateTransition(
WithVerifyPayload(true).
WithVerifyRandao(false).
WithVerifyResult(false).
WithMeterGas(true).
WithOptimisticEngine(true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped WithOptimisticEngine since we don't silence any error anymore

),
)
if err != nil {
return fmt.Errorf("failed sending forkchoice update without attributes: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we decorate the error with information about the hashes as well, could be useful

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -79,7 +80,7 @@ func (pb *PayloadBuilder) RequestPayloadAsync(
},
)
if err != nil {
return nil, err
return nil, fmt.Errorf("RequestPayloadAsync failed sending forkchoice update: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not for this pr but we should consider renaming RequestPayloadAsync

Copy link
Contributor

@rezbera rezbera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to see the concept of Optimistic reconciled to just payload builds.

nits, but otherwise LGTM. We should let these sequence of PRs soak for a few days before release

@abi87 abi87 assigned shotes and abi87 Feb 7, 2025
@calbera calbera force-pushed the deposit-sync-enforce-error branch from 30e1cb9 to 12cf8d9 Compare February 7, 2025 17:15
Copy link
Contributor

@calbera calbera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK, confirming with testing of EL un-aliveness, bad peering on full node

@calbera
Copy link
Contributor

calbera commented Feb 7, 2025

Noting behavior of full node:

  • In bootstrapping mode, if the EL is completely shutdown, the CL will error in finalize block and does immediately shutdown.
2025-02-07T12:45:41-05:00 ERRR Received undefined error during new payload call service=execution-engine payload_block_hash=0xa7b83707491d71e414489018ccc994fba72d1ec6275f5cb3a85955227ba1505a parent_hash=0xa7b83707491d71e414489018ccc994fba72d1ec6275f5cb3a85955227ba1505a error=got an unexpected server error in JSON-RPC response failed to convert from jsonrpc.Error: Post "http://localhost:8551": EOF
2025-02-07T12:45:41-05:00 ERRR Failed to process verified beacon block service=blockchain error=got an unexpected server error in JSON-RPC response failed to convert from jsonrpc.Error: Post "http://localhost:8551": EOF
2025-02-07T12:45:41-05:00 ERRR Error in proxyAppConn.FinalizeBlock module=state err=got an unexpected server error in JSON-RPC response failed to convert from jsonrpc.Error: Post "http://localhost:8551": EOF
panic: Failed to process committed block (832362:8A9B54FF30D4F1C4C16D017BD262ABFF6D1D3158B0B3B1DB2C10894D415772AA): got an unexpected server error in JSON-RPC response failed to convert from jsonrpc.Error: Post "http://localhost:8551": EOF

goroutine 4550 [running]:
github.com/cometbft/cometbft/internal/blocksync.(*Reactor).poolRoutine(0x14000cec908, 0x0)
	github.com/cometbft/[email protected]/internal/blocksync/reactor.go:551 +0x13a4
github.com/cometbft/cometbft/internal/blocksync.(*Reactor).OnStart.func1()
	github.com/cometbft/[email protected]/internal/blocksync/reactor.go:139 +0x54
created by github.com/cometbft/cometbft/internal/blocksync.(*Reactor).OnStart in goroutine 4542
	github.com/cometbft/[email protected]/internal/blocksync/reactor.go:137 +0x88
  • In normal operations, if the EL is completely shutdown, the CL will error in finalize block, but the CL does NOT immediately shutdown! No more blocks are being processed, but keeps looking for peers.
    • Where to improve in future PR: The comet node will never retry into sync mode, so we should handle ourselves, app-side
2025-02-07T12:41:01-05:00 ERRR Received undefined error during new payload call service=execution-engine payload_block_hash=0xbe1b3564295fc3a7d2065f1eb8753abaa73953c77c48060534ad2e0d12eb8885 parent_hash=0xbe1b3564295fc3a7d2065f1eb8753abaa73953c77c48060534ad2e0d12eb8885 error=got an unexpected server error in JSON-RPC response failed to convert from jsonrpc.Error: Post "http://localhost:8551": dial tcp [::1]:8551: connect: connection refused
2025-02-07T12:41:01-05:00 ERRR Failed to process verified beacon block service=blockchain error=got an unexpected server error in JSON-RPC response failed to convert from jsonrpc.Error: Post "http://localhost:8551": dial tcp [::1]:8551: connect: connection refused
2025-02-07T12:41:01-05:00 ERRR Error in proxyAppConn.FinalizeBlock module=state err=got an unexpected server error in JSON-RPC response failed to convert from jsonrpc.Error: Post "http://localhost:8551": dial tcp [::1]:8551: connect: connection refused
2025-02-07T12:41:01-05:00 ERRR CONSENSUS FAILURE!!! module=consensus err=failed to apply block; error got an unexpected server error in JSON-RPC response failed to convert from jsonrpc.Error: Post "http://localhost:8551": dial tcp [::1]:8551: connect: connection refused

@abi87 abi87 enabled auto-merge (squash) February 10, 2025 18:12
@abi87 abi87 merged commit d708c71 into main Feb 10, 2025
19 checks passed
@abi87 abi87 deleted the deposit-sync-enforce-error branch February 10, 2025 18:18
rezbera added a commit that referenced this pull request Feb 12, 2025
shotes added a commit that referenced this pull request Feb 12, 2025
nidhi-singh02 pushed a commit that referenced this pull request Feb 17, 2025
abi87 added a commit that referenced this pull request Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants