runtime: contention in runtime.newMarkBits on gcBitsArenas.lock #61428
Labels
compiler/runtime
Issues related to the Go compiler and/or runtime.
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Performance
Milestone
This is the third in what I think will be a series of three issue reports about contention I've seen on various runtime-internal singleton locks in an app that typically runs on dual-socket linux/amd64 machines with 96 (or sometimes 64) hardware threads.
This issue is about contention in
runtime.newMarkBits
ongcBitsArenas.lock
.I hope it'll be of interest to @mknyszek . Also CC @golang/runtime
Sorry about the somewhat outdated version. I haven't seen related issues go by, so I hope the info in this one is still of some use.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
I don't know yet if the issue is present in Go 1.20 series or in the release candidates of the Go 1.21 series.
What operating system and processor architecture are you using (
go env
)?The app runs on Linux / x86_64, typically on machines with two processor sockets (NUMA). I don't have the output of
go env
from the app's runtime environment.What did you do?
A data-processing app received a higher volume of data than usual. Here are some more of its dimensions:
The average allocation size is about 128 bytes. Across the whole app, it averages around 4 GiB per second of allocations, or 16 kiB of allocations every 4 µs. Split across the 96 threads the runtime uses for the app (GOMAXPROCS == num hardware threads), that's 40 MiB per second per thread, an average of 3 µs between allocations, or 16 kiB of allocations every 400 µs.
What did you expect to see?
I expected to see very little contention on runtime-internal singleton locks.
What did you see instead?
Contention on a lock in
runtime.newMarkBits
, and on the lock-free datastructure inruntime.(*spanSet).pop
.Although the runtime uses more than one
spanSet
, it appears to have only one of those for each size class. That makes them singletons in effect, especially if the app's memory usage is concentrated on a small number of size classes.Looking at the application-level callers of
runtime.mallocgc
, the on-CPU time is mostly fromruntime.newobject
as called from seven allocation sites. From the allocs profile there are two that allocate 64-byte objects, one that allocates 80-byte objects, two of 112, one of 128, and one that allocates slices up to about 1 MiB in size (though most of the memory here is in 10 kiB allocations).Looking at the size of memory allocations over the lifetime of the process (command below, though I suppose it's also available via
runtime.ReadMemStats
) confirms that the bulk of the bytes allocated are in the 64, 80, 112, and 128-byte classes.Data set 6d039fa93e6d0b7bb2446028f322b7ac9a0d3b3b47a2ea0ed201779a6d81ed78 is from a instance of the app running on a 96-thread machine. The CPU profile ran for 5.18 seconds, during which it collected samples corresponding to a total 444 seconds of on-CPU time (an average of 86 on-CPU threads) and 36.90+15.37 = 52.27 seconds of time (an average of 10 on-CPU threads) in calls from
runtime.newMarkBits
to lock and unlockgcBitsArenas.lock
.https://github.com/golang/go/blob/go1.19.6/src/runtime/mheap.go#L2059
https://github.com/golang/go/blob/go1.19.6/src/runtime/mheap.go#L2064
It shows 100.46 seconds of time (an average of 19 on-CPU threads) spent manipulating the lock-free data structure in
runtime.(*spanSet).pop
. The breakdown of where that time goes includes the following:54% of samples in CAS https://github.com/golang/go/blob/go1.19.6/src/runtime/mspanset.go#L168
16% of samples in Xadd https://github.com/golang/go/blob/go1.19.6/src/runtime/mspanset.go#L214
11% of samples here, with no atomics but within claimloop https://github.com/golang/go/blob/go1.19.6/src/runtime/mspanset.go#L146
...
0.17% of samples in this atomic load following claimloop, indicating that the easy instructions within claimloop run many more times than easy instructions after it https://github.com/golang/go/blob/go1.19.6/src/runtime/mspanset.go#L190
The text was updated successfully, but these errors were encountered: