-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable/Disable builder proposals via HTTP reloads whole validator definitions #3795
Comments
CC: @paulhauner |
I think this is an issue affecting web3signer validators specifically. The PATCH method triggers a call to lighthouse/validator_client/src/initialized_validators.rs Lines 995 to 997 in bf533c8
There's no similar check in the web3signer case, so we end up re-initializing every validator for every request here: lighthouse/validator_client/src/initialized_validators.rs Lines 1053 to 1059 in bf533c8
I think a simple fix might be just to add a similar duplicate check to the web3signer branch. |
@michaelsproul thanks lighthouse/validator_client/src/initialized_validators.rs Lines 995 to 997 in bf533c8
which test validates that if above? I wanted to check if there was something similar that could be added as a test for the web3signer flow. |
I tried that change and tested it manually, but it didn't work. It loaded the whole validator anyway again. |
@ricardolyn I tried Michael's suggestion in this PR #3801 It does look like it's working for me |
let me do a test on my side as I didn't change the code in the same way you did. |
## Issue Addressed #3795 Co-authored-by: realbigsean <[email protected]>
## Issue Addressed #3795 Co-authored-by: realbigsean <[email protected]>
Closed by #3801 (v3.4.0) 🎉 Please comment if you find it isn't fixed |
@michaelsproul The team has been testing the v3.4.0 version of Lighthouse and encountered an issue with the PATCH lighthouse/validators/{pubkey} request, which is being performed every 2 minutes. Despite the fact that Lighthouse is not being killed every 2 minutes, it is still being restarted around every 2 hours by Kubernetes due to CPU throttling when updating 6 validators concurrently. This could be a major problem in production, where there could be more than a thousand public keys in the same service, potentially leading to even more frequent restarts. Any ideas on what may be causing this excessive CPU usage during the PATCH request? |
What's the PATCH body that's being sent? Is it toggling
How high does the CPU spike and for how long? As an aside, I'm always confused by the Kubernetes policy to kill services that use too much CPU, as that just seems to slow everything down, causing more work to restart everything and then just re-creating the bug as soon as the process has restarted. Nonetheless, it would be good to fix the underlying issue. When you say 6 validators, do you mean there are 6 validators in the VC total, or just that there are 6 requests made at the same time and the number of validators is another number? If so, how many validators are registered to the VC total? |
The body of the PATCH request is
Let me correct my sentence in the previous message: the service is being killed by Kubernetes not due to CPU throttling but because the liveness probe fails. This is the probe config we use: Example of the liveness probe fail: We could also use the FYI: This chart shows the relation between
There are 500 keys being loaded through the definitions file, but only 6 are active. 6 parallel PATCH requests are made every 2 minutes to update the builder_proposals Example log: |
## Issue Addressed sigp#3795 Co-authored-by: realbigsean <[email protected]>
Description
We have a setup of LH running BN and VC separated. The VC is loading thousands of validator keys through from the validation_definitions file. We use web3signer externally for singing purposes.
We are working on activating the builder proposals to some of the validators (not all, so removing the
--builder-proposals
argument). We are using the API as documented on [this link](https://lighthouse-book.sigmaprime.io/builders.html#enabledisable-builder- proposals-via-http).When experimenting with this API by only calling the PATCH to 1 validator, we noticed that it loads ALL validator keys again from the definitions file, creating high CPU usage.
The request is something like:
PATCH lighthouse/validators/{pubkey}
with a body with{ "builder_proposals": true }
. When making the request, the logs of LH VC show the following:So it loads all pub keys again. on the example we did, it loads the 500 keys even though we only patched 1. Imagine that it loads 500 keys for the 500 requests we need to update the state of all keys, and it becomes an exponential problem.
On the tests we did, the CPU usage and Storage I/O goes high when compared to not doing those requests.
Version
version: Lighthouse/v3.3.0-bf533c8
Present Behaviour
When patching the
builder_proposal
value for a validator key, the LH VC service reloads all validators again.Expected Behaviour
It should update that validator and not reload the data for all validators configured.
The text was updated successfully, but these errors were encountered: