Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/metrics: add contention metrics #49881

Closed
znkr opened this issue Nov 30, 2021 · 12 comments
Closed

runtime/metrics: add contention metrics #49881

znkr opened this issue Nov 30, 2021 · 12 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FeatureRequest Issues asking for a new feature that does not need a proposal. FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@znkr
Copy link
Contributor

znkr commented Nov 30, 2021

Right now lock contention is only possible to debug with pprof using the block profile. This is very useful once contention has been identified as the issue, but since it has to be turned on manullay it doesn't help in identifying that contention is an issue. Exporting the cumulative wait time via runtime/metrics would allow continouous monitoring of contention and help in debugging Go programs.

@prattmic
Copy link
Member

cc @mknyszek @aclements

@prattmic prattmic added this to the Go1.19 milestone Nov 30, 2021
@prattmic prattmic added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Nov 30, 2021
@mknyszek
Copy link
Contributor

Interesting... Looks like we already measure time spent blocked for starvation purposes anyway. I suppose this wouldn't be too hard to expose. Is cumulative wait time the right metric? It's useful for comparison, but it doesn't really tell you much on its own.

An approximate distribution of latencies might be more useful for this, because you can correlate that with e.g. request latency, but it's slightly more expensive and it requires a bit more post-processing. Its other downside is that it's approximate -- I'm not sure if there's some special use case enabled by having a precise cumulative wait time.

@mknyszek
Copy link
Contributor

To be clear I'm not opposed to cumulative wait time. If that's the convention we can move forward with that. Just want to make sure we considered our options.

@mknyszek mknyszek changed the title runtime/metrics: Contention metrics runtime/metrics: add contention metrics Nov 30, 2021
@mknyszek mknyszek added the FeatureRequest Issues asking for a new feature that does not need a proposal. label Nov 30, 2021
@znkr
Copy link
Contributor Author

znkr commented Nov 30, 2021

Latency distribution would be even better. Approximate times would be good enough for all use cases I can think of.

@znkr
Copy link
Contributor Author

znkr commented Jan 20, 2022

cc @jeremyfaller

@ianlancetaylor
Copy link
Member

Moving to Backlog. Please recategorize as appropriate. Thanks.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.19, Backlog Jun 24, 2022
@znkr
Copy link
Contributor Author

znkr commented Jul 4, 2022

Would it be a lot of work to get this into 1.20? It would help a lot in understanding performance issues in production jobs and a lot of Google internal users have expressed interest in this.

@aclements
Copy link
Member

Given that there's known demand for this and that I think it wouldn't be hard to add (though I might be wrong), I'll move this to 1.20, but given our focus on core project health and PGO for 1.20, I'm not sure we'll get to this.

@aclements aclements modified the milestones: Backlog, Go1.20 Jul 6, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@mknyszek
Copy link
Contributor

I'll probably do some other runtime/metrics stuff this release, so I'll put it on my plate.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/427618 mentions this issue: runtime/metrics: add /sync/mutex-wait:seconds metric

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/427617 mentions this issue: runtime: set G wait reason more consistently

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/427616 mentions this issue: runtime: make the wait reason for a g blocked on a mutex more specific

gopherbot pushed a commit that referenced this issue Sep 16, 2022
This change adds 3 new waitReasons that correspond to sync.Mutex.Lock,
sync.RWMutex.RLock, and sync.RWMutex.Lock that are plumbed down into
semacquire1 by exporting new functions to the sync package from the
runtime.

Currently these three functions show up as "semacquire" in backtraces
which isn't very clear, though the stack trace itself should reveal
what's really going on. This represents a minor improvement to backtrace
readability, though blocking on an RWMutex.w.Lock will still show up as
blocking on a regular mutex (I suppose technically it is).

This is a step toward helping the runtime identify when a goroutine is
blocked on a mutex of some kind.

For #49881.

Change-Id: Ia409b4d27e117fe4bfdc25fa541e9c58d6d587b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/427616
TryBot-Result: Gopher Robot <[email protected]>
Auto-Submit: Michael Knyszek <[email protected]>
Reviewed-by: Michael Pratt <[email protected]>
Run-TryBot: Michael Knyszek <[email protected]>
gopherbot pushed a commit that referenced this issue Sep 16, 2022
Currently, wait reasons are set somewhat inconsistently. In a follow-up
CL, we're going to want to rely on the wait reason being there for
casgstatus, so the status quo isn't really going to work for that. Plus
this inconsistency means there are a whole bunch of cases where we could
be more specific about the G's status but aren't.

So, this change adds a new function, casGToWaiting which is like
casgstatus but also sets the wait reason. The goal is that by using this
API it'll be harder to forget to set a wait reason (or the lack thereof
will at least be explicit). This change then updates all casgstatus(gp,
..., _Gwaiting) calls to casGToWaiting(gp, ..., waitReasonX) instead.
For a number of these cases, we're missing a wait reason, and it
wouldn't hurt to add a wait reason for them, so this change also adds
those wait reasons.

For #49881.

Change-Id: Ia95e06ecb74ed17bb7bb94f1a362ebfe6bec1518
Reviewed-on: https://go-review.googlesource.com/c/go/+/427617
Reviewed-by: Michael Pratt <[email protected]>
Run-TryBot: Michael Knyszek <[email protected]>
Auto-Submit: Michael Knyszek <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Repository owner moved this from Todo to Done in Go Compiler / Runtime Sep 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FeatureRequest Issues asking for a new feature that does not need a proposal. FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
Status: Done
Development

No branches or pull requests

6 participants