-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overall improvement of the Debug Controller #2437
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…full of pendingRequests
Aschen
approved these changes
Feb 28, 2023
Kudos, SonarCloud Quality Gate passed!
|
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
This PR makes a lot of improvement to the Kuzzle Cluster and Debugger stability while debugging:
Node eviction prevention
The node eviction prevention mode has been improved, before when a node was used to debug the cluster was notified to not evict this node even if its heartbeat was not received in time by the other nodes that were subscribed.
This is necessary when doing a snapshot of the Node process which causes the node process to not be active during the snapshot.
However when the Node that was prevented from being evicted while debugging had recovered after taking a heap snapshot, there was a case when this Node would evict other nodes because it had not received any heartbeat during the time it was taking the heap snapshot since the node threads were frozen.
A change has been made to now revoke the right of a node to evict other nodes when this node is currently prevented from being evicted from the cluster.
Another changed has been made to the ID Card Renewer, before this PR the ID Card Renewer was a worker thread that had for unique goal to maintain the ID Card data alive inside Redis.
But when taking heap snapshot the debugger is taking a snapshot of each threads owned by the process which was freezing the ID Card Renewer and preventing it from renewing the ID Card in time, which was causing the node to evict himself, and not being seen by new nodes.
Now the ID Card Renewer is created inside another process via a fork and maintains a communication channel with the main process.
Segfault when taking a Heap Snapshot
We discovered an issue when taking a Heap Snapshot via the
Inspector
that was causing NodeJS to segfault when thereportProgress
flag was set totrue
because theinspector
was trying to make a call into the JS Heap while the heap is being inspected for the snapshot.See this issue: nodejs/node#44634
Now, when
reportProgress
is set totrue
we override it and change it tofalse
, but this causes theChrome Inspector
to not parse the snapshot even though it has received every chunk of it.The
Chrome Inspector
relies on the event emitted byreportProgress
to show a progress bar that represent the state of the ongoing snapshot, and to know when the snapshot is finished this way it can wait for all the snapshot chunks to be received to start the parsing.In order to force the Chrome Inspector to parse the snapshot we fake the
HeapProfiler.reportHeapSnapshotProgress
and send the following params directly after receivingreportProgress:true
:The
done
andtotal
parameters are only used to show a progress bar and nothing else, so they're not important, only thefinished
property is used by the Chrome Inspector to spawn a worker that will wait for all the chunks to be received and start parsing the snapshot.Sending the event
HeapProfiler.reportHeapSnapshotProgress
before any of theHeapProfiler.addHeapSnapshotChunk
events are sent has no impact, infact this what happens all the time when taking a snapshot, first multipleHeapProfiler.reportHeapSnapshotProgress
are emitted until one is emitted withfinished:true
, then all theHeapProfiler.addHeapSnapshotChunk
are emitted with the snapshot chunks.Allow socket connections used for debugging to bypass some of Kuzzle's limitations
When taking a Heap Snapshot the debugger emits a lot of
HeapProfiler.addHeapSnapshotChunk
event which contains the snapshot chunks, those chunks are often bigger than 1mb and they're a lot of them, so much than it can cause them to accumulate in the BackPressure buffer dedicated to the socket connection and even exceed the BackPressure buffer limit from the KuzzleRC causing the socket to be closed by Kuzzle.This is a huge problem since it disconnects the debugger, and prevents us from receiving all the chunks of a snapshot.
To avoid this, when a websocket connection is using the debugger and listen to any event from the debugger by calling
debug:addListener
the websocket connection is marked as used for debugging and is allowed to exceed such limitations.Another issue that could occur is when Kuzzle request are added to the pending buffer if there are too many request to process, what could happens is request issued to the debug controller meant to debug Kuzzle might be put at the end of the buffer and never be processed or be processed so much later that it renders their result useless.
To counter that, requests from websocket connections previously marked as used for debugging are put at the top of the queue allowing them to be processed first by Kuzzle when Kuzzle is under pressure.