Description
So it looks like mixed-size-accesses made the round on twitter again recently which got me thinking about them. rust-lang/rust#97516 carefully chose not to talk about them (and anyway it's not clear that is the right place to go into more detail about them). So what do we want to do with them in Rust?
Some facts:
-
In C++, you cannot convert a
&uint16_t
into an reference to an array "because no such array exists at that location in memory"; they insist that memory is strongly typed. This means they don't even have to talk about mixed-size accesses. It also means they are ignoring a large fraction of the programs out there but I guess they are fine with that. We are not. ;) -
Apparently the x86 manual says you "should" not do this: "Software should access semaphores (shared memory used for signalling between multiple processors) using identical addresses and operand lengths." It is unclear what "should" means (or what anything else here really means, operationally speaking...)
-
In Rust, it is pretty much established that you can safely turn a
&mut u16
into a&mut [u8; 2]
. So you can do something where you start with a&mut AtomicU16
, do some atomic things to it, then use the above conversion and get a&mut AtomicU8
and do atomic things with that -- a bona fide mixed-size atomic access.However, this means that there is a happens-before edge between all the 16-bit accesses and the 8-bit accesses. Having the
&mut
means that there is synchronization. I hope this means not even Intel disagrees with this, but since literally none of the words in that sentence is defined, who knows.
So... it seems like the most restrictive thing we can say, without disallowing code that you can already write entirely safely using bytemuck, is that
- it is allowed to do differently-sized atomic accesses to the same location over time,
- but only if any two not-perfectly-overlapping accesses are completely synchronized through other means (i.e., it is not these accesses themselves that add a happens-before edge, there already exists a happens-before edge through other accesses).
- Any other kind of mixed-size access is UB.
Activity
m-ou-se commentedon Jul 2, 2022
That all matches my understanding as well.
(It's part of the reason why I added
Atomic*::from_mut
.)If they would disagree with that, that'd basically imply that after using some memory for an atomic operation, you can never re-use that memory again. E.g. deallocating a Box would be unsafe, and so would be a stack-allocated AtomicU16 that goes out of scope.
They don't say it very clearly, but I don't see how their no-mixed-sizes rule can apply to anything other than atomic operations on the same memory that race with each other.
Yeah, it could very well turn out that "should" just means "for performance", and that it has nothing to do with correctness. They're not very clear.
That seems like exactly the right thing to say, and matches what you can do in safe Rust (if we include the unstable
Atomic*::from_mut
).I don't think it's impossible that this might be less restrictive in the future, if we find more reasons to believe that racing mixed-size atomic operations will work on all platforms.
Converting
uint16_t*
to achar*
however is fine, e.g. to memset or memcpy into a uint16_t or struct, etc.In C++20, you can also have a
struct X { int a; int b; }
and create anstd::atomic_ref<X>
first, and anstd::atomic_ref<int>
to one of the fields later.In atomics.ref.generic#general-3, they clearly specify mixed-size accesses in the same way as us:
(Emphasis mine.)
Amanieu commentedon Jul 2, 2022
ARM's memory model (in the section: The AArch64 Application Level Memory Model) seems to fully support mixed-sized atomic accesses.
RalfJung commentedon Jul 2, 2022
Ah, good point. I had forgotten that hardware memory models do not have provenance. 😂
.. and include bytemuck
Isn't that a C thing? Though C++ might have something similar with
std::byte
.But anyway that's a non-atomic type.
Oh, good point. So in some sense this is actually already all covered by rust-lang/rust#97516.
bjorn3 commentedon Jul 2, 2022
When reusing memory it is undef, right? Furthermore deallocation requires some kind of synchronization with every thread that has ever accessed it using atomic operations. Together I would assume this is enough to provide consistency by "resetting" the state witnessing that it was accessed using atomic operations of a different size.
m-ou-se commentedon Jul 2, 2022
C++ allows aliasing through
char
,unsigned char
andstd::byte
: https://wg21.link/basic.lval#11.3To add to my previous comment: one of the reasons why C++'s atomic_ref doesn't allow mixed size / overlapping operations, is that it supports objects of any size. If it gets too big for native atomic instructions, it uses a mutex instead, which is probably stored in some kind of global table indexed by the address of the object. It's not completely clear whether it's necessary to be as restrictive when limited to only natively supported atomic operations, like in Rust.
chorman0773 commentedon Apr 3, 2023
I thik we can interpet "should" as "It's undefined, by spec, though it works in practice b/c some important people rely on it, but please don't, we want to do fast things".
Therefore, the mixed size access should be considered undefined by Rust, as we expect to be able to compile to x86 where it is undefined.
cbeuw commentedon Apr 6, 2023
Regarding x86, I got the following from @thiagomacieira which is very helpful
RalfJung commentedon Apr 10, 2023
So this means there are atomic 256bit accesses but doing size mixing with those is a problem? (Not sure what "1-uop operations are.)
chorman0773 commentedon Apr 10, 2023
thiagomacieira commentedon Apr 10, 2023
SSE loads and stores are atomic. AVX 256- and 512-bit loads and stores are atomic on P-core processors, but not E-core (the 256-bit operation is cracked into two 128-bit operations and therefore not atomic).
There s no RMW SIMD. The best you get is a merge-store and I'm confident that's atomic on P-core, but I doubt so on E-core. Therefore, SIMD atomics are very limited if you can only use them for loads and stores. The most useful thing that this could be done for is to load 16 bytes atomically, and use CMPXCHG16B to store, but it's still limited and somewhat slow due to the transfer between register files. seqlocks are more flexible.
19 remaining items