Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store pubkey cache decompressed on disk #5897

Merged
merged 9 commits into from
Jul 4, 2024

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Jun 6, 2024

Issue Addressed

Closes:

Proposed Changes

Resolve a long-standing performance pitfall involving the decompression of pubkeys on startup. This PR improves Lighthouse's startup time dramastically.

Additional Info

I propose we merge this PR as the first DB schema change adopted from tree-states, after the Electra PR which implements v20 is merged:

Blocked on a fix to the v20 schema:

@michaelsproul michaelsproul added work-in-progress PR is a work-in-progress optimization Something to make Lighthouse run more efficiently. database backwards-incompat Backwards-incompatible API change blocked and removed work-in-progress PR is a work-in-progress labels Jun 6, 2024
@michaelsproul michaelsproul force-pushed the uncompressed-pubkeys branch from 420a715 to 2579248 Compare June 27, 2024 05:59
@michaelsproul michaelsproul added ready-for-review The code is ready for review v5.3.0 Q3 2024 release with database changes! labels Jun 27, 2024
@michaelsproul
Copy link
Member Author

Ready for review for 5.3.0. Let's goooo 🚀

Copy link
Collaborator

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as is! Love it how straightforward it is.

I feel we could drop from the cache the need to compute and store compressed keys. Those are available on any state and seem to be used exclusively for sync committees.

@michaelsproul
Copy link
Member Author

michaelsproul commented Jun 27, 2024

Good idea. We can look into that in a future PR, maybe alongside consolidating the pubkey cache on the beacon state with the global one

@michaelsproul
Copy link
Member Author

There's something wrong with this PR and I haven't worked out what. I'm getting an error about the op pool being corrupt?

Jun 28 06:50:51.097 INFO Logging to file path: "/home/michael/.lighthouse/mainnet/beacon/logs/beacon.log"
Jun 28 06:50:51.102 INFO Lighthouse started version: Lighthouse/v5.2.1-cc5789b
Jun 28 06:50:51.102 INFO Configured for network name: mainnet
Jun 28 06:50:51.102 INFO Data directory initialised datadir: /home/michael/.lighthouse/mainnet
Jun 28 06:50:51.105 INFO Deposit contract address: 0x00000000219ab540356cbb839cbe05303d7705fa, deploy_block: 1118
4524
Jun 28 06:50:51.249 INFO Hot-Cold DB initialized split_state: 0x50d483a6162e195975ed3582b517aeb6a08c76a099e55e683c7471b1
c284abf2, split_slot: 9394368, service: freezer_db
Jun 28 06:50:51.249 INFO Blob DB initialized oldest_blob_slot: Some(Slot(9392448)), path: "/home/michael/.lighthouse
/mainnet/beacon/blobs_db", service: freezer_db
Jun 28 06:50:51.249 INFO Upgrading from v19 to v20
Jun 28 06:50:51.394 INFO Upgrading from v20 to v21
Jun 28 06:51:01.205 INFO Public key decompression in progress keys_decompressed: 200000
Jun 28 06:51:11.024 INFO Public key decompression in progress keys_decompressed: 400000
Jun 28 06:51:20.861 INFO Public key decompression in progress keys_decompressed: 600000
Jun 28 06:51:30.683 INFO Public key decompression in progress keys_decompressed: 800000
Jun 28 06:51:40.366 INFO Public key decompression in progress keys_decompressed: 1000000
Jun 28 06:51:49.616 INFO Public key decompression in progress keys_decompressed: 1200000
Jun 28 06:51:59.236 INFO Public key decompression in progress keys_decompressed: 1400000
Jun 28 06:52:02.636 INFO Public key decompression complete
Jun 28 06:52:03.501 INFO Refusing to checkpoint sync msg: database already exists, use --purge-db to force checkpoint sync,
service: beacon
Jun 28 06:52:03.501 INFO Starting beacon chain method: resume, service: beacon
Jun 28 06:52:04.350 CRIT Failed to start beacon node reason: DB error whilst reading persisted op pool: SszDecodeError(OffsetIntoFixedPortion(4))
Jun 28 06:52:04.350 INFO Internal shutdown received reason: Failed to start beacon node
Jun 28 06:52:04.350 INFO Shutting down.. reason: Failure("Failed to start beacon node")

This would suggest the v20 migration is screwed, but in isolation the v20 migration works fine

@michaelsproul michaelsproul force-pushed the uncompressed-pubkeys branch from 2579248 to cc5789b Compare June 28, 2024 06:57
@michaelsproul michaelsproul added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels Jun 28, 2024
@michaelsproul
Copy link
Member Author

The bug was in the v20 migration as suspected:

@michaelsproul michaelsproul added ready-for-review The code is ready for review blocked and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. ready-for-review The code is ready for review labels Jul 2, 2024
@michaelsproul michaelsproul added the ready-for-merge This PR is ready to merge. label Jul 2, 2024
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed blocked ready-for-merge This PR is ready to merge. labels Jul 3, 2024
@chong-he
Copy link
Member

chong-he commented Jul 3, 2024

It's working, but I don't see Upgrading from v19 to v20 in my log though (was with v5.2.1 with v19 database). I guess because I am not running this PR: #5712?

Jul 03 02:57:05.200 INFO Logging to file                         path: "/home/hi/.lighthouse/mainnet/beacon/logs/beacon.log"
Jul 03 02:57:05.226 INFO Lighthouse started                      version: Lighthouse/v5.2.1-2da5e5e
Jul 03 02:57:05.226 INFO Configured for network                  name: mainnet
Jul 03 02:57:05.233 INFO Data directory initialised              datadir: /home/hi/.lighthouse/mainnet
Jul 03 02:57:05.249 INFO Deposit contract                        address: 0x00000000219ab540356cbb839cbe05303d7705fa, deploy_block: 11184524
Jul 03 02:57:05.516 INFO Hot-Cold DB initialized                 split_state: 0xe64c0d60eac08824895ce932236bb33c58b61c51f80fb5a27929a0ccb2055c40, split_slot: 9429152, service: freezer_db
Jul 03 02:57:05.518 INFO Blob DB initialized                     oldest_blob_slot: Some(Slot(9429152)), path: "/home/hi/.lighthouse/mainnet/beacon/blobs_db", service: freezer_db
Jul 03 02:57:05.522 INFO Upgrading from v20 to v21
Jul 03 02:57:18.316 INFO Public key decompression in progress    keys_decompressed: 200000
Jul 03 02:57:31.084 INFO Public key decompression in progress    keys_decompressed: 400000
Jul 03 02:57:43.867 INFO Public key decompression in progress    keys_decompressed: 600000
Jul 03 02:57:56.631 INFO Public key decompression in progress    keys_decompressed: 800000
Jul 03 02:58:09.394 INFO Public key decompression in progress    keys_decompressed: 1000000
Jul 03 02:58:22.177 INFO Public key decompression in progress    keys_decompressed: 1200000
Jul 03 02:58:34.936 INFO Public key decompression in progress    keys_decompressed: 1400000
Jul 03 02:58:39.691 INFO Public key decompression complete
Jul 03 02:58:41.082 INFO Refusing to checkpoint sync             msg: database already exists, use --purge-db to force checkpoint sync, service: beacon
Jul 03 02:58:41.082 INFO Starting beacon chain                   method: resume, service: beacon
Jul 03 02:58:45.051 INFO Block production enabled                method: json rpc via http, endpoint: Auth { endpoint: "http://localhost:8551/", jwt_path: "/home/hi/.ethereum/geth/jwtsecret", jwt_id: None, jwt_version: None }
Jul 03 02:58:45.076 WARN Error connecting to eth1 node endpoint  endpoint: http://localhost:8551/, auth=true, service: deposit_contract_rpc
Jul 03 02:58:45.076 ERRO Error updating deposit contract cache   error: Invalid endpoint state: RequestFailed("eth_chainId call failed HttpClient(url: http://localhost:8551/, kind: request, detail: error trying to connect: tcp connect error: Connection refused (os error 111))"), retry_millis: 60000, service: deposit_contract_rpc
Jul 03 02:58:50.062 INFO Beacon chain initialized                head_slot: 9429152, head_block: 0xc9a7…189a, head_state: 0xe64c…5c40, service: beacon
Jul 03 02:58:50.063 INFO Timer service started                   service: node_timer
Jul 03 02:58:50.064 INFO UPnP Attempting to initialise routes
Jul 03 02:58:50.064 INFO Execution payloads are pruned           service: freezer_db
Jul 03 02:58:50.076 INFO ENR Initialised                         quic6: None, quic4: Some(9001), udp6: None, tcp6: None, tcp4: Some(9000), udp4: None, ip4: None, id: 0x31d6..d831, seq: 85, enr: enr:-LW4QGU5CCe81bQUD3IleeHYQyb87Lb5r0HGKuqx4QH77BUkRABySr_vou7mX33Ea7rsGwfmOGsZY5TiPR_ntJDbnvxVh2F0dG5ldHOIAAAAAAAAAACEZXRoMpBqlaGpBAAAAP__________gmlkgnY0hHF1aWOCIymJc2VjcDI1NmsxoQLHRreJ_fLUw0whRbgMLqrYY3av40ZGUyqKXWqQYD2wP4hzeW5jbmV0cwCDdGNwgiMo, service: libp2p
Jul 03 02:58:50.093 INFO Libp2p Starting                         bandwidth_config: 3-Average, peer_id: 16Uiu2HAm8qZeuGM1Y6Z5DCvHAPiBDXrPnp5dyTCCgoJtYCB7pr62, service: libp2p
Jul 03 02:58:50.096 INFO Listening established                   address: /ip4/0.0.0.0/tcp/9000/p2p/16Uiu2HAm8qZeuGM1Y6Z5DCvHAPiBDXrPnp5dyTCCgoJtYCB7pr62, service: libp2p
Jul 03 02:58:50.097 INFO Listening established                   address: /ip4/0.0.0.0/udp/9001/quic-v1/p2p/16Uiu2HAm8qZeuGM1Y6Z5DCvHAPiBDXrPnp5dyTCCgoJtYCB7pr62, service: libp2p
Jul 03 02:58:50.112 INFO Deterministic long lived subnets enabled, subscription_duration_in_epochs: 256, subnets_per_node: 2, service: attestation_service
Jul 03 02:58:50.112 INFO Subscribing to long-lived subnets       subnets: [SubnetId(52), SubnetId(53)], service: attestation_service
Jul 03 02:58:50.115 INFO HTTP API started                        listen_address: 127.0.0.1:5052

It is running normally afterwards

@michaelsproul
Copy link
Member Author

Oh that was a log I added when debugging this PR, and then reverted (because the same commit added some asserts). I'll re-add it

Thanks for testing!

@michaelsproul
Copy link
Member Author

@mergify queue

Copy link

mergify bot commented Jul 4, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at d84e3e3

mergify bot added a commit that referenced this pull request Jul 4, 2024
@mergify mergify bot merged commit d84e3e3 into sigp:unstable Jul 4, 2024
26 of 28 checks passed
@michaelsproul michaelsproul deleted the uncompressed-pubkeys branch July 4, 2024 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backwards-incompat Backwards-incompatible API change database optimization Something to make Lighthouse run more efficiently. ready-for-review The code is ready for review v5.3.0 Q3 2024 release with database changes!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants