-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(testing): modified start-reth to enforce persistence #2536
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2536 +/- ##
=======================================
Coverage 32.77% 32.77%
=======================================
Files 351 351
Lines 15857 15857
Branches 20 20
=======================================
Hits 5197 5197
Misses 10276 10276
Partials 384 384 |
@@ -86,7 +86,10 @@ start-reth: ## start an ephemeral `reth` node | |||
--authrpc.addr "0.0.0.0" \ | |||
--authrpc.jwtsecret $(JWT_PATH) \ | |||
--datadir ${ETH_DATA_DIR} \ | |||
--ipcpath ${IPC_PATH} | |||
--ipcpath ${IPC_PATH} \ | |||
-vvvvv \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not make this the default verbosity - vvvvv should be specified when debugging. Logs will get crowded and very large very quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep definitely not for real nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think people use make start-reth
to run their production. If they do, they should not. I like increasing outputs in our tests and I think we can sustain it as is.
--ipcpath ${IPC_PATH} \ | ||
-vvvvv \ | ||
--engine.persistence-threshold 0 \ | ||
--engine.memory-block-buffer-target 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Maximum number of blocks to be kept only in memory without triggering
/// persistence.
persistence_threshold: u64,
/// How close to the canonical head we persist blocks. Represents the ideal
/// number of most recent blocks to keep in memory for quick access and reorgs.
///
/// Note: this should be less than or equal to `persistence_threshold`.
memory_block_buffer_target: u64,
Noting the above code in reth (slightly more verbose than the CLI info)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rezbera do you want to me to include this somehow in the Makefile?
I think noting it here as you did is enough (and appreciated!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is enough - as long as we're aware @abi87
Awesome, there had to be some configuration that we could tune to change this behaviour! Great great |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick sanity check in the reth code for usage of these vars and this makes sense to me.
Noting that persistence_threshold
must be >=
memory_block_buffer_target
.
This likely has performance implications on Reth so perf testing this before release would be appropriate
@rezbera I agree with the performance hit, but I don't think we need to require people using this in production yet? |
So the problem this solves is Reth nodes crashing and then losing blocks. Such a problem could also occur once at the tip of the chain, so we would want Reth users to run it even then right? Agree that it's not an urgent rollout, but my understanding is that longer term, users should always run with these flags |
That would be ideal, and maybe even recommended given our relative small validator set. However as long as we can count on a sufficient majority having stored the finalized blocks, all should be fine. Nodes can rollback via script and download the missing payloads/reverted blocks from peers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK. running on my local mainnet fullnode that had EL behind CL by ~1k blocks. EL catches up from the new startup procedure that we trigger in the CL.
However, I have started seeing NewPayload calls time out way more often (even with an increased timeout of 1s or 2s)
2025-02-24T15:57:59-08:00 ERRR Received undefined error during new payload call service=execution-engine payload_block_hash=0x47b3bc54ef46cdc00d07e21e0a4bd5e481c4a72f32144529a82971d51e280d49 parent_hash=0x47b3bc54ef46cdc00d07e21e0a4bd5e481c4a72f32144529a82971d51e280d49 error=Post "http://localhost:8551": engine API call timed out
2025-02-24T15:57:59-08:00 ERRR Failed to process verified beacon block service=blockchain error=Post "http://localhost:8551": engine API call timed out
2025-02-24T15:57:59-08:00 ERRR Error in proxyAppConn.FinalizeBlock module=state err=Post "http://localhost:8551": engine API call timed out
panic: Failed to process committed block (1068277:6634401D9928F5B23094EC258A6E64D1C9260D4932D46EFE74673CE71FE50D07): Post "http://localhost:8551": engine API call timed out
Thanks @calbera, we'll have to monitor this |
By default,
reth
does not persist every finalized block. Instead it stores blocks up to a certain threshold (by default 2 heights below tip, which in our case means a finalized block may be not persisted).This may cause an issue while syncing Beacond, since we have SSF and we assume finalized blocks are duly persisted.
This PR setup configs in order to duly persist all the blocks we need