Treat any statsd metric driver related errors as non-fatal (just log the errors) #4597
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request updates our statsd driver to treat any exception which may be thrown by the driver as non-fatal.
Metrics driver related error should never break or degrade application performance and this pull request ensures that is indeed the case.
Background / Context
One of our users reported an issue with statd driving breaking / hanging the application when a metric backend server goes down. Initially I wasn't able to replicate it (statsd uses UDP so it should be fire and forget and there should be no exceptions).
Today the user provided log files. It turns out the problem lies in DNS resolution errors which can happen when establishing a connection to the backend server.
If a hostname and not IP address is used for metric backend and DNS resolution fails, statsd will propagate this exception all the way up and this will break the application since we don't catch and ignore / log those exceptions.
I wasn't able to reproduce the error because we use IP and not the hostname everywhere and even if backend server goes down, UDP send won't result in any exception due to the nature of the protocol.
Proposed Solution
In this solution I simply use a decorator which just logs any exception thrown by the driver and doesn't propagate it.
Eventually we could perhaps move that logic to the statsd client library, but it's still safer to have such safe guard in our code. As mentioned above, we don't want any such error / exception to break or degrade application performance.
TODO