chore(testing): modified start-reth to enforce persistence #2536

abi87 · 2025-02-24T17:29:27Z

By default, reth does not persist every finalized block. Instead it stores blocks up to a certain threshold (by default 2 heights below tip, which in our case means a finalized block may be not persisted).
This may cause an issue while syncing Beacond, since we have SSF and we assume finalized blocks are duly persisted.

This PR setup configs in order to duly persist all the blocks we need

codecov · 2025-02-24T17:31:00Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 32.77%. Comparing base (a23a74a) to head (9cc3658).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2536   +/-   ##
=======================================
  Coverage   32.77%   32.77%           
=======================================
  Files         351      351           
  Lines       15857    15857           
  Branches       20       20           
=======================================
  Hits         5197     5197           
  Misses      10276    10276           
  Partials      384      384

shotes · 2025-02-24T17:40:41Z

scripts/build/testing.mk

@@ -86,7 +86,10 @@ start-reth: ## start an ephemeral `reth` node
 	--authrpc.addr "0.0.0.0" \
 	--authrpc.jwtsecret $(JWT_PATH) \
 	--datadir ${ETH_DATA_DIR} \
-	--ipcpath ${IPC_PATH}
+	--ipcpath ${IPC_PATH} \
+	-vvvvv \


I think we should not make this the default verbosity - vvvvv should be specified when debugging. Logs will get crowded and very large very quickly.

Yep definitely not for real nodes

I don't think people use make start-reth to run their production. If they do, they should not. I like increasing outputs in our tests and I think we can sustain it as is.

rezbera · 2025-02-24T18:50:27Z

scripts/build/testing.mk

+	--ipcpath ${IPC_PATH} \
+	-vvvvv \
+	--engine.persistence-threshold 0 \
+	--engine.memory-block-buffer-target 0


/// Maximum number of blocks to be kept only in memory without triggering /// persistence. persistence_threshold: u64, /// How close to the canonical head we persist blocks. Represents the ideal /// number of most recent blocks to keep in memory for quick access and reorgs. /// /// Note: this should be less than or equal to `persistence_threshold`. memory_block_buffer_target: u64,

Noting the above code in reth (slightly more verbose than the CLI info)

@rezbera do you want to me to include this somehow in the Makefile?
I think noting it here as you did is enough (and appreciated!)

Here is enough - as long as we're aware @abi87

fridrik01 · 2025-02-24T18:54:18Z

Awesome, there had to be some configuration that we could tune to change this behaviour! Great great

rezbera

Did a quick sanity check in the reth code for usage of these vars and this makes sense to me.

Noting that persistence_threshold must be >= memory_block_buffer_target.

This likely has performance implications on Reth so perf testing this before release would be appropriate

abi87 · 2025-02-24T22:22:25Z

Did a quick sanity check in the reth code for usage of these vars and this makes sense to me.

Noting that persistence_threshold must be >= memory_block_buffer_target.

This likely has performance implications on Reth so perf testing this before release would be appropriate

@rezbera I agree with the performance hit, but I don't think we need to require people using this in production yet?
At the very least, I would suggest to use the configs while syncing and remove them once node is ready to close to tip.
If anything happens and blocks are lost, scripts/rollback.sh will help.
Thoughts?

rezbera · 2025-02-24T22:28:53Z

Did a quick sanity check in the reth code for usage of these vars and this makes sense to me.
Noting that persistence_threshold must be >= memory_block_buffer_target.
This likely has performance implications on Reth so perf testing this before release would be appropriate

@rezbera I agree with the performance hit, but I don't think we need to require people using this in production yet? At the very least, I would suggest to use the configs while syncing and remove them once node is ready to close to tip. If anything happens and blocks are lost, scripts/rollback.sh will help. Thoughts?

So the problem this solves is Reth nodes crashing and then losing blocks. Such a problem could also occur once at the tip of the chain, so we would want Reth users to run it even then right?

Agree that it's not an urgent rollout, but my understanding is that longer term, users should always run with these flags

abi87 · 2025-02-24T22:37:41Z

So the problem this solves is Reth nodes crashing and then losing blocks. Such a problem could also occur once at the tip of the chain, so we would want Reth users to run it even then right?

That would be ideal, and maybe even recommended given our relative small validator set. However as long as we can count on a sufficient majority having stored the finalized blocks, all should be fine. Nodes can rollback via script and download the missing payloads/reverted blocks from peers.

calbera

tACK. running on my local mainnet fullnode that had EL behind CL by ~1k blocks. EL catches up from the new startup procedure that we trigger in the CL.

However, I have started seeing NewPayload calls time out way more often (even with an increased timeout of 1s or 2s)

2025-02-24T15:57:59-08:00 ERRR Received undefined error during new payload call service=execution-engine payload_block_hash=0x47b3bc54ef46cdc00d07e21e0a4bd5e481c4a72f32144529a82971d51e280d49 parent_hash=0x47b3bc54ef46cdc00d07e21e0a4bd5e481c4a72f32144529a82971d51e280d49 error=Post "http://localhost:8551": engine API call timed out
2025-02-24T15:57:59-08:00 ERRR Failed to process verified beacon block service=blockchain error=Post "http://localhost:8551": engine API call timed out
2025-02-24T15:57:59-08:00 ERRR Error in proxyAppConn.FinalizeBlock module=state err=Post "http://localhost:8551": engine API call timed out
panic: Failed to process committed block (1068277:6634401D9928F5B23094EC258A6E64D1C9260D4932D46EFE74673CE71FE50D07): Post "http://localhost:8551": engine API call timed out

abi87 · 2025-02-25T08:25:11Z

tACK. running on my local mainnet fullnode that had EL behind CL by ~1k blocks. EL catches up from the new startup procedure that we trigger in the CL.

Thanks @calbera, we'll have to monitor this

modified start-reth to enforce persistence

9cc3658

abi87 requested a review from a team as a code owner February 24, 2025 17:29

abi87 self-assigned this Feb 24, 2025

abi87 temporarily deployed to test-e2e February 24, 2025 17:29 — with GitHub Actions Inactive

shotes reviewed Feb 24, 2025

View reviewed changes

rezbera self-requested a review February 24, 2025 18:31

rezbera reviewed Feb 24, 2025

View reviewed changes

rezbera approved these changes Feb 24, 2025

View reviewed changes

abi87 requested a review from shotes February 24, 2025 22:22

calbera reviewed Feb 25, 2025

View reviewed changes

nidhi-singh02 approved these changes Feb 25, 2025

View reviewed changes

abi87 merged commit 902f432 into main Feb 25, 2025
19 checks passed

abi87 deleted the start-reth-unlock-mainnet-sync branch February 25, 2025 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(testing): modified start-reth to enforce persistence #2536

chore(testing): modified start-reth to enforce persistence #2536

abi87 commented Feb 24, 2025

codecov bot commented Feb 24, 2025 •

edited

Loading

shotes Feb 24, 2025

rezbera Feb 24, 2025

abi87 Feb 24, 2025

rezbera Feb 24, 2025 •

edited

Loading

abi87 Feb 24, 2025

rezbera Feb 24, 2025

fridrik01 commented Feb 24, 2025

rezbera left a comment

abi87 commented Feb 24, 2025

rezbera commented Feb 24, 2025 •

edited

Loading

abi87 commented Feb 24, 2025

calbera left a comment

abi87 commented Feb 25, 2025

chore(testing): modified start-reth to enforce persistence #2536

chore(testing): modified start-reth to enforce persistence #2536

Conversation

abi87 commented Feb 24, 2025

codecov bot commented Feb 24, 2025 • edited Loading

Codecov Report

shotes Feb 24, 2025

Choose a reason for hiding this comment

rezbera Feb 24, 2025

Choose a reason for hiding this comment

abi87 Feb 24, 2025

Choose a reason for hiding this comment

rezbera Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

abi87 Feb 24, 2025

Choose a reason for hiding this comment

rezbera Feb 24, 2025

Choose a reason for hiding this comment

fridrik01 commented Feb 24, 2025

rezbera left a comment

Choose a reason for hiding this comment

abi87 commented Feb 24, 2025

rezbera commented Feb 24, 2025 • edited Loading

abi87 commented Feb 24, 2025

calbera left a comment

Choose a reason for hiding this comment

abi87 commented Feb 25, 2025

codecov bot commented Feb 24, 2025 •

edited

Loading

rezbera Feb 24, 2025 •

edited

Loading

rezbera commented Feb 24, 2025 •

edited

Loading