-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in iostat-await calculation in Metricbeat #30480
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
@thekofimensah do you face the same problem on 7.17.0 or 8.0.0? |
Hey @belimawr, I will eventually get metricbeat updated on that machine, but our deployment is a bit tricky to be able to do on the fly upgrades. Was there a fix for this in the newer versions? If not, I doubt it is fixed because it has been a problem for a long time. I just looked at my records, and since 7.9.x it's been around. |
@thekofimensah can we see:
Also, the |
@thekofimensah This is pretty baffling. The best explanation would be that some of the diskstats fields, particularly the milisecond write count, are rolling over, but that seems kind of inconceivable on the time frames we're seeing. Is there any way I can see a "bad" Also, are there any errors from the |
Here are the events surrounding a "bad" event, one before and one after :
Let me see if I can find a log |
Based on the sample I looked at, there's nothing in the logs but debug logging isn't active. |
Alright, I was at least somewhat right, and the counter is rolling over/deincrementing. In the first event, EDIT: nevermind. Annoyingly, I found different kernel docs that explicitly mark those values as 32-bit: https://docs.kernel.org/admin-guide/iostats.html Should be an easy fix, since it makes sense that some of those counters could roll over fairly frequently. |
Perfect, you need anything else from me? |
@thekofimensah I don't think so, hoping to have a PR up today for a fix. |
Closing thanks to #30679 |
This error persists in version of metrcibeat 7.9-7.16 and most likely before as well.
I'm pulling in iostat-await logs with metricbeat and noticing that certain machines are repeatably returning massive numbers (like 8081489971116.08ms) which I'm guessing is most likely a rollover issue in the kernal probably connected to an integer overflow issue. Couldn't find anyone else reporting this so I thought it would be worth pointing out that at some place in the code, there is an issue subtracting two uint64s that results in a reasonable value.
The text was updated successfully, but these errors were encountered: