Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive node with latest snapshot cannot reach live #11306

Open
1 task done
MrFrogoz opened this issue Sep 28, 2024 · 20 comments
Open
1 task done

Archive node with latest snapshot cannot reach live #11306

MrFrogoz opened this issue Sep 28, 2024 · 20 comments
Labels
A-op-reth Related to Optimism and op-reth C-bug An unexpected or incorrect behavior C-perf A change motivated by improving speed, memory usage or disk footprint M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity S-needs-investigation This issue requires detective work to figure out what's going wrong

Comments

@MrFrogoz
Copy link

MrFrogoz commented Sep 28, 2024

Describe the bug

When the node is set with "--l2.enginekind=reth" and starts downloading from a checkpoint to a target, at this point it is very slow:

op-reth[38832]: 2024-09-28T00:14:27.546943Z  INFO Status connected_peers=22 stage=MerkleExecute checkpoint=20337319 target=20343860 stage_progress=33.54% stage_eta=2h 26m 19s
op-reth[38832]: 2024-09-28T00:14:34.634691Z  INFO Committed stage progress pipeline_stages=8/12 stage=MerkleExecute checkpoint=20337319 target=20343860 stage_progress=33.77% stage_eta=1h 36m 42s
op-reth[38832]: 2024-09-28T00:14:35.068968Z  INFO Preparing stage pipeline_stages=8/12 stage=MerkleExecute checkpoint=20337319 target=20343860
op-reth[38832]: 2024-09-28T00:14:35.069000Z  INFO Executing stage pipeline_stages=8/12 stage=MerkleExecute checkpoint=20337319 target=20343860 stage_eta=1h 36m 42s
op-reth[38832]: 2024-09-28T00:14:37.419851Z  INFO Committed stage progress pipeline_stages=8/12 stage=MerkleExecute checkpoint=20337319 target=20343860 stage_progress=33.79% stage_eta=2h 41m 48s
op-reth[38832]: 2024-09-28T00:14:37.459676Z  INFO Preparing stage pipeline_stages=8/12 stage=MerkleExecute checkpoint=20337319 target=20343860
op-reth[38832]: 2024-09-28T00:14:37.459701Z  INFO Executing stage pipeline_stages=8/12 stage=MerkleExecute checkpoint=20337319 target=20343860 stage_eta=2h 41m 48s
op-reth[38832]: 2024-09-28T00:14:40.913377Z  INFO Committed stage progress pipeline_stages=8/12 stage=MerkleExecute checkpoint=20337319 target=20343860 stage_progress=33.82% stage_eta=2h 3m 13s
op-reth[38832]: 2024-09-28T00:14:40.973253Z  INFO Preparing stage pipeline_stages=8/12 stage=MerkleExecute checkpoint=20337319 target=20343860

and when he finishes a set of blocks, he is still always a couple of hours behind and starts again with a new set of blocks. He does this for days without ever catching up to live.

Steps to reproduce

wget https://mainnet-reth-archive-snapshots.base.org/$(curl https://mainnet-reth-archive-snapshots.base.org/latest)

ExecStart=/root/build/op-node \
    --network=base-mainnet \
    --l1=........ \
    --l2=http://localhost:8551 \
    --rpc.addr=0.0.0.0 \
    --rpc.port=60004 \
    --l2.jwt-secret=/root/build/jwt.hex \
    --l1.trustrpc \
    --l1.rpckind=basic \
    --l1.beacon=........ \
    --syncmode=execution-layer \
    --l2.enginekind=reth

ExecStart=/root/build/op-reth node \
    --chain=base \
    --datadir=/db/reth/base \
    --authrpc.addr=0.0.0.0 \
    --authrpc.port=8551 \
    --authrpc.jwtsecret=/root/build/jwt.hex \
    --http.port=50004 \
    --http.addr=0.0.0.0 \
    --http.corsdomain=* \
    --http \
    --ws \
    --http.api=admin,debug,eth,net,trace,txpool,web3,rpc,reth,ots \
    --rollup.sequencer-http=https://mainnet-sequencer.base.org \
    --rpc-max-connections=1000000 \
    --rpc-max-tracing-requests=1000000

Node logs

No response

Platform(s)

Linux (x86)

What version/commit are you on?

latest

What database version are you on?

latest

Which chain / network are you on?

base

What type of node are you running?

Archive (default)

What prune config do you use, if any?

No response

If you've built Reth from source, provide the full command you used

No response

Code of Conduct

  • I agree to follow the Code of Conduct
@MrFrogoz MrFrogoz added C-bug An unexpected or incorrect behavior S-needs-triage This issue needs to be labelled labels Sep 28, 2024
@MrFrogoz
Copy link
Author

MrFrogoz commented Sep 30, 2024

I'm also trying without the "--l2.enginekind=reth" option and the node can only recover one block every 2/3 seconds, it's extremely slow considering how the Base chain is going at the moment, to get back live with a node 1 day behind, it takes about 1 week

op-reth[56231]: 2024-09-30T02:17:49.552305Z  INFO Block added to canonical chain number=20361203 hash=0x797e1489659cb6e3deee73e87efad223f7ace461d1ab2ea899942710de6b9a71 peers=8 txs=98 gas=29.95 Mgas gas_throughput=327.24 Mgas/second full=22.7% base_fee=0.01gwei blobs=0 excess_blobs=0 elapsed=91.517882ms
op-reth[56231]: 2024-09-30T02:17:52.019591Z  INFO Block added to canonical chain number=20361204 hash=0x61aa864cf69080675ffdb279fea17061d8e3a60a908498b44d02f2c4af82f0ac peers=8 txs=137 gas=31.56 Mgas gas_throughput=366.72 Mgas/second full=23.9% base_fee=0.01gwei blobs=0 excess_blobs=0 elapsed=86.067298ms
op-reth[56231]: 2024-09-30T02:17:54.171398Z  INFO Block added to canonical chain number=20361205 hash=0xd379d74ae1b0858700cecbd4b829af58065a0b902bbced6cdf6d3b0df7c30b36 peers=8 txs=149 gas=26.66 Mgas gas_throughput=341.26 Mgas/second full=20.2% base_fee=0.01gwei blobs=0 excess_blobs=0 elapsed=78.134131ms
op-reth[56231]: 2024-09-30T02:17:56.456186Z  INFO Block added to canonical chain number=20361206 hash=0x77b76bdd0c02ceb4b2b0d8ad4d5091e3d90b72e64f7d5509db776b0f9402fa09 peers=8 txs=119 gas=25.17 Mgas gas_throughput=304.98 Mgas/second full=19.1% base_fee=0.01gwei blobs=0 excess_blobs=0 elapsed=82.518137ms

@Rjected
Copy link
Member

Rjected commented Oct 3, 2024

Hi @MrFrogoz , what hardware are you running this on? Specifically, what kind of disk, cpu, and total ram?

@MrFrogoz
Copy link
Author

MrFrogoz commented Oct 3, 2024

AMD EPYC 7J13 16 Core, 64 GB Ram, Nvme 50K IOPS - 680 Mbps. Usage when running: CPU 3% - RAM 30%

I run many nodes of different blockchains with the same hardware, only Base has slowness issues with blocks sync

@MrFrogoz
Copy link
Author

MrFrogoz commented Oct 7, 2024

Is there anything news? in the meantime i noticed that when the node is under many debug trace calls, the synchronization is even slower, almost always remain late, the node usage always remains low. In the meantime reading the other thread, i try the "--engine.experimental" option to see if the node can stay live

@emhane emhane added A-op-reth Related to Optimism and op-reth and removed S-needs-triage This issue needs to be labelled labels Oct 8, 2024
@MrFrogoz
Copy link
Author

MrFrogoz commented Oct 8, 2024

I succeeded to put the node live with that option, then I removed it for the following problem: #11570. However, restarting the node without that option, to recover 10 minutes of offline, the node took 2 hours to go live again. I hope you will be able to improve the performance of the binary, I don't know if op-node has something to do with this continuous slowdown. I will wait a feedback

@MrFrogoz
Copy link
Author

MrFrogoz commented Oct 11, 2024

still considering that the sync is quite slow as written before, if I try to make trace block calls only with live blocks, the node is again unable to stay live, and starts to accumulate continuous delay

@mattsse
Copy link
Collaborator

mattsse commented Oct 12, 2024

--rpc-max-tracing-requests=1000000

this is likely the reason you're experiencing the tracing issues, tracing is CPU bound hence the concurrent tracing requests are by default limited to number of cores minus x depending on the number of available cores

Usage when running: CPU 3% - RAM 30%

this sounds more like a disk issue @Rjected ?

@MrFrogoz
Copy link
Author

MrFrogoz commented Oct 12, 2024

I understand, but even if I keep a node with rpc api closed to all, it is still very slow to resync to live. For disk I am using a m2 ssd with these settings:
IOPS:
50,000 IOPS
Throughput:
680 MB/s

Current utilization per second:
3,151 IOPS
15 MB/s

I have a cluster of 5 nodes and they all have the same problem, each node is a few hours late.
I have nodes of optimism and ethereum with reth and they have no problems, both mainnet and testnet, probably due to the fact that Base mainnet has a very high traffic that perhaps the reth node cannot handle well

@MrFrogoz
Copy link
Author

MrFrogoz commented Oct 15, 2024

Is it possible to have some feedback? are you working on it? are you checking if the problem is real? other developers from what I've seen online have the same slowdown even on different infrastructures

@Rjected
Copy link
Member

Rjected commented Oct 15, 2024

@MrFrogoz sorry for the lack of an update - we still don't know what the issue could be. We've been running base nodes on our own infrastructure without issue, although base may just require lower disk latency (given the iops / bandwidth don't seem saturated). An exact disk model would help, since some disks perform better than others

@MrFrogoz
Copy link
Author

MrFrogoz commented Oct 15, 2024

Unfortunately the Oracle data center does not explicitly state which disk model they use, but they definitely use NVMe SSD units, theoretically it equals the aws gp3 disk type. The fact that it doesn't stay online even with closed rpc calls and that the disk metrics data is so unused, suggests that there is some lack of optimization once a certain tps value is reached

@Rjected
Copy link
Member

Rjected commented Oct 15, 2024

@MrFrogoz is the storage analogous to i.e. GCP "local SSD"s, or AWS "instance storage", for example on their r5d machines? Or is the storage a network-attached block storage system (AWS EBS, GCP hyperdisk, etc)? Is there an instance type we can take a look at? reth performs poorly on network-attached block storage because those systems have much higher latency, making it much more difficult to utilize the metered IOPS. This is because our I/O access patterns during block validation are synchronous and not parallel.

@MrFrogoz
Copy link
Author

MrFrogoz commented Oct 16, 2024

As written above you can take as a reference aws ebs in gp3 is identical to the one I'm using.
I attach the disk performances that are equivalent to that of a gp3.
Obviously an io2 type has half latency
Command:

Random Reads	
Throughput(MB/s)	sudo fio --filename=/dev/device --direct=1 --rw=randread --bs=256k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1 --readonly
IOPS (avg)	sudo fio --filename=/dev/device --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly
Latency (ms)	sudo fio --filename=/dev/device --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=1 --numjobs=1 --time_based --group_reporting --name=readlatency-test-job --runtime=120 --eta-newline=1 --readonly
Random Read/Writes	
Throughput (MB/s)	sudo fio --filename=/dev/device --direct=1 --rw=randrw --bs=256k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1
IOPS (avg)	sudo fio --filename=/dev/device --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1
Latency (ms)	sudo fio --filename=/dev/device --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=1 --numjobs=1 --time_based --group_reporting --name=rwlatency-test-job --runtime=120 --eta-newline=1
Sequential Reads	
Throughput (MB/s)	sudo fio --filename=/dev/device --direct=1 --rw=read --bs=256k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4 --time_based --group_reporting --name=throughput-test-job --eta-newline=1 --readonly
IOPS	sudo fio --filename=/dev/device --direct=1 --rw=read --bs=4k --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based --group_reporting --name=iops-test-job --eta-newline=1 --readonly

Block Volume:
oci-blockvolume

@MrFrogoz
Copy link
Author

MrFrogoz commented Oct 20, 2024

I guess this task will not be handled. You will probably handle it when some chain that uses reth does like Base but x2 traffic where not even aws io2 will be enough

In the meantime I'll try these values ​​to improve the situation

      --engine.persistence-threshold <PERSISTENCE_THRESHOLD>
          Configure persistence threshold for engine experimental

          [default: 2]

      --engine.memory-block-buffer-target <MEMORY_BLOCK_BUFFER_TARGET>
          Configure the target number of blocks to keep in memory

          [default: 2]

@MrFrogoz
Copy link
Author

MrFrogoz commented Oct 21, 2024

I tried setting those values ​​to 10, it seems that the node is going 2 times faster at sync and also the disk usage seems completely different.

Soon as the node goes live again, I'll try a stress test on the heaviest calls to see if node can stay live

image

@MrFrogoz
Copy link
Author

Solved using bare metal servers with SAMSUNG MZVL2512HCJQ-00B00, I hope in the future it can be used on common cloud providers

@Rjected
Copy link
Member

Rjected commented Oct 30, 2024

Just as an update, we're aware of this and will get to it eventually, but there aren't any easy fixes for this that we have time to work on right now. Going to leave this open, and will let you know when we start a more thorough effort to improve performance on cloud disks

@Rjected Rjected added S-needs-investigation This issue requires detective work to figure out what's going wrong C-perf A change motivated by improving speed, memory usage or disk footprint M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity labels Oct 30, 2024
@MrFrogoz MrFrogoz changed the title URGENT - Archive node with latest snapshot cannot reach live Archive node with latest snapshot cannot reach live Oct 31, 2024
@tmeinlschmidt
Copy link

tmeinlschmidt commented Jan 1, 2025

I'm on the same... syncing base mainnet archive

op-reth 1.1.4

it does stages 1-12, then again, again, again

log looks like

2025-01-01T21:29:27.104102Z  INFO Status connected_peers=12 stage=IndexAccountHistory checkpoint=24463053 target=24476488
2025-01-01T21:29:28.551486Z  INFO Writing indices progress=99.99%
2025-01-01T21:29:28.605366Z  INFO Finished stage pipeline_stages=11/12 stage=IndexAccountHistory checkpoint=24476488 target=24476488
2025-01-01T21:29:29.851790Z  INFO Preparing stage pipeline_stages=12/12 stage=Finish checkpoint=24463053 target=24476488
2025-01-01T21:29:29.851812Z  INFO Executing stage pipeline_stages=12/12 stage=Finish checkpoint=24463053 target=24476488
2025-01-01T21:29:29.852622Z  INFO Finished stage pipeline_stages=12/12 stage=Finish checkpoint=24476488 target=24476488
2025-01-01T21:29:30.134922Z  INFO Preparing stage pipeline_stages=1/12 stage=Headers checkpoint=24476488 target=None
2025-01-01T21:29:33.772468Z  INFO Received headers total=10000 from_block=24488811 to_block=24478812
2025-01-01T21:29:33.793856Z  INFO Received headers total=2323 from_block=24478811 to_block=24476489
2025-01-01T21:29:33.798284Z  INFO Writing headers total=12323
2025-01-01T21:29:33.798343Z  INFO Executing stage pipeline_stages=1/12 stage=Headers checkpoint=24476488 target=None
2025-01-01T21:29:33.851622Z  INFO Writing headers progress=10.00%
2025-01-01T21:29:33.859647Z  INFO Writing headers progress=20.00%
2025-01-01T21:29:33.867581Z  INFO Writing headers progress=29.99%
2025-01-01T21:29:33.875622Z  INFO Writing headers progress=39.99%
2025-01-01T21:29:33.883598Z  INFO Writing headers progress=49.99%
2025-01-01T21:29:33.889688Z  INFO Writing headers progress=59.99%
2025-01-01T21:29:33.895422Z  INFO Writing headers progress=69.98%
2025-01-01T21:29:33.901204Z  INFO Writing headers progress=79.98%
2025-01-01T21:29:33.906923Z  INFO Writing headers progress=89.98%
2025-01-01T21:29:33.912652Z  INFO Writing headers progress=99.98%
2025-01-01T21:29:33.912671Z  INFO Writing headers hash index total=12323
2025-01-01T21:29:36.023198Z  INFO Writing headers hash index progress=10.00%
2025-01-01T21:29:37.892682Z  INFO Writing headers hash index progress=20.00%
2025-01-01T21:29:39.728430Z  INFO Writing headers hash index progress=29.99%
2025-01-01T21:29:41.561926Z  INFO Writing headers hash index progress=39.99%
2025-01-01T21:29:43.357475Z  INFO Writing headers hash index progress=49.99%
2025-01-01T21:29:45.179475Z  INFO Writing headers hash index progress=59.99%
2025-01-01T21:29:46.956945Z  INFO Writing headers hash index progress=69.98%
2025-01-01T21:29:48.958195Z  INFO Writing headers hash index progress=79.98%
2025-01-01T21:29:50.782668Z  INFO Writing headers hash index progress=89.98%
2025-01-01T21:29:52.103767Z  INFO Status connected_peers=11 stage=Headers checkpoint=24476488 target=None
2025-01-01T21:29:52.598568Z  INFO Writing headers hash index progress=99.98%
2025-01-01T21:29:52.602716Z  INFO Finished stage pipeline_stages=1/12 stage=Headers checkpoint=24488811 target=None stage_progress=100.00%
2025-01-01T21:29:52.900883Z  INFO Preparing stage pipeline_stages=2/12 stage=Bodies checkpoint=24476488 target=24488811
...
2025-01-01T21:43:37.104142Z  INFO Status connected_peers=11 stage=Execution checkpoint=24476488 target=24488811
2025-01-01T21:44:02.103746Z  INFO Status connected_peers=11 stage=Execution checkpoint=24476488 target=24488811
2025-01-01T21:44:27.104157Z  INFO Status connected_peers=11 stage=Execution checkpoint=24476488 target=24488811
2025-01-01T21:44:52.104087Z  INFO Status connected_peers=11 stage=Execution checkpoint=24476488 target=24488811
2025-01-01T21:45:17.104342Z  INFO Status connected_peers=11 stage=Execution checkpoint=24476488 target=24488811
2025-01-01T21:45:42.103701Z  INFO Status connected_peers=11 stage=Execution checkpoint=24476488 target=24488811
2025-01-01T21:46:01.495667Z  INFO Committed stage progress pipeline_stages=4/12 stage=Execution checkpoint=24480683 target=24488811 stage_progress=99.88%
2025-01-01T21:46:05.715353Z  INFO Preparing stage pipeline_stages=4/12 stage=Execution checkpoint=24480683 target=24488811
...
2025-01-01T21:55:42.104139Z  INFO Status connected_peers=11 stage=Execution checkpoint=24480683 target=24488811 stage_progress=99.88%
2025-01-01T21:56:07.103763Z  INFO Status connected_peers=11 stage=Execution checkpoint=24480683 target=24488811 stage_progress=99.88%
2025-01-01T21:56:32.103799Z  INFO Status connected_peers=11 stage=Execution checkpoint=24480683 target=24488811 stage_progress=99.88%
2025-01-01T21:56:57.103665Z  INFO Status connected_peers=11 stage=Execution checkpoint=24480683 target=24488811 stage_progress=99.88%
2025-01-01T21:57:12.256607Z  INFO Committed stage progress pipeline_stages=4/12 stage=Execution checkpoint=24485109 target=24488811 stage_progress=99.94%
...
2025-01-01T22:01:24.391191Z  INFO Executed block range start=24488140 end=24488278 throughput="687.77 Mgas/second"
2025-01-01T22:01:32.104357Z  INFO Status connected_peers=11 stage=Execution checkpoint=24485109 target=24488811 stage_progress=99.94%
2025-01-01T22:01:34.423043Z  INFO Executed block range start=24488279 end=24488472 throughput="903.12 Mgas/second"
2025-01-01T22:01:44.424965Z  INFO Executed block range start=24488473 end=24488621 throughput="676.07 Mgas/second"
2025-01-01T22:01:54.435328Z  INFO Executed block range start=24488622 end=24488761 throughput="597.33 Mgas/second"
2025-01-01T22:01:57.104223Z  INFO Status connected_peers=11 stage=Execution checkpoint=24485109 target=24488811 stage_progress=99.94%

and never catch the tip.

I'm on quite fast hw,

# free
               total        used        free      shared  buff/cache   available
Mem:       263776208    88334400     4293088       13780   171148720   173126212
Swap:      268435448        4608   268430840

and cpu

processor       : 47
vendor_id       : AuthenticAMD
cpu family      : 23
model           : 49
model name      : AMD EPYC 7402 24-Core Processor
stepping        : 0
microcode       : 0x830107c
cpu MHz         : 2400.000

storage is zfs on 6 nvme disks (SAMSUNG MZQLB3T8HALS).

we're running three more nodes, they're ok - with the same configuration, from the same snapshot.

cmdline (docker, ghcr.io/paradigmxyz/op-reth:v1.1.4):

        "Path": "/usr/local/bin/op-reth",
        "Args": [
            "node",
            "--chain=base",
            "--rollup.sequencer-http=https://mainnet-sequencer.base.org",
            "--rollup.disable-tx-pool-gossip",
            "--config=/data/reth.toml",
            "--datadir=/data",
            "--http",
            "--http.addr=0.0.0.0",
            "--http.port=18565",
            "--http.corsdomain=*",
            "--http.api=debug,eth,net,trace,txpool,web3,rpc,reth,ots,flashbots",
            "--authrpc.addr=0.0.0.0",
            "--authrpc.port=18561",
            "--authrpc.jwtsecret=/data/jwt.txt",
            "--rollup.discovery.v4",
            "--engine.persistence-threshold",
            "10",
            "--engine.memory-block-buffer-target",
            "10",
            "--port",
            "30302"

@emhane emhane moved this to Backlog in Project Tracking Feb 12, 2025
@emhane emhane moved this from Backlog to TODO in Project Tracking Feb 12, 2025
@emhane emhane moved this from TODO to In Progress in Project Tracking Feb 12, 2025
@emhane emhane moved this from In Progress to TODO in Project Tracking Feb 12, 2025
@emhane emhane moved this from TODO to Backlog in Project Tracking Feb 13, 2025
@MrFrogoz
Copy link
Author

MrFrogoz commented Feb 19, 2025

@tmeinlschmidt very strange that it doesn't work with that type of disk, are you running on a bare metal server? Anyway to temporarily manage the problem, even if you run in reth, insert this in the op-node config: --l2.enginekind=geth , op-reth will try again to do its same stage run, once done, you will see that it stops and starts receiving blocks as if it were an op-geth

@janzhanal
Copy link

Same issue as @tmeinlschmidt with base and reth 1.2.0.
Will try with --l2.enginekind=geth as suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-op-reth Related to Optimism and op-reth C-bug An unexpected or incorrect behavior C-perf A change motivated by improving speed, memory usage or disk footprint M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity S-needs-investigation This issue requires detective work to figure out what's going wrong
Projects
Status: Todo
Development

No branches or pull requests

6 participants