Description
Update (2025-05-20): see #321 (comment)
core::arch::{load, store}
This proposes new load
and store
functions (in core::arch
), for raw hardware loads and stores, and a concept of an always-valid type that can safely be cast to a byte array. It also defines volatile accesses in terms of these functions.
The functions proposed here have the same semantics as raw machine load and store instructions. The compiler is not permitted to assume that the values loaded or stored are initialized, or even that they point to valid memory. However, it is permitted to assume that load
and store
do not violate Rust’s mutability rules.
In particular, it is valid to use these functions to manipulate memory that is being concurrently accessed or modified by any means whatsoever. Therefore, they can be used to access memory that is shared with untrusted code. For example, a kernel could use them to access userspace memory, and a user-mode server could use them to access memory shared with a less-privileged user-mode process. It is also safe to use these functions to manipulate memory that is being concurrently accessed via DMA, or that corresponds to a memory-mapped hardware register.
The core guarantee that makes load
and store
useful is this: A call to load
or store
is guaranteed to result in exactly one non-tearing non-interlocked load from or store to the exact address passed to the function, no matter what that address happens to be. To ensure this, load
and store
are considered partially opaque to the optimizer. The optimizer must consider them to be calls to functions that may or may not dereference their arguments. It is even possible that the operation triggers a hardware fault that some other code catches and recovers from. Hence, the compiler can never prove that a given call to core::arch::load
and core::arch::store
will have undefined behavior. In other ways, a call to load
or store
does not disable any optimizations that a call to an unknown function with the same argument would not also disable. In short: garbage in, garbage out.
The actual functions are as follows:
unsafe fn load<T>(ptr: *const T) -> T;
unsafe fn store<T>(ptr: *mut T, arg: T);
Performs a single memory access (of size size_of::<T>()
) on ptr
. The compiler must compile each these function calls into exactly one machine instruction. If this is not possible, it is a compile-time error. The types T
for which a compiler can successfully generate code for these calls is dependent on the target architecture. Using a T
that cannot safely be transmuted to or from a byte array is not forbidden, but is often erroneous, and thus triggers a lint (see below). Provided that ptr
is properly aligned, these functions are guaranteed to not cause tearing. If ptr
is not properly aligned, the results are architecture-dependent.
The optimizer is not permitted to assume that ptr
is dereferenceable or that it is properly aligned. This allows these functions to be used for in-process debuggers, crash dumpers, and other applications that may need to access memory at addresses obtained from some external source, such as a debug console or /proc/self/maps
. If load
is used to violate the aliasing rules (by accessing memory the compiler thinks cannot be accessed), the value returned may be non-deterministic and may contain sensitive data. If store
is used to overwrite memory the compiler can assume will not be modified, subsequent execution (after the call to store
returns) has undefined behavior.
The semantics of volatile
A call to ptr::read_volatile
desugars to one or more calls to load
, and a call to ptr::write_volatile
desugars to one or more calls to store
. The compiler is required to minimize tearing to the extent possible, provided that doing so does not require the use of interlocked or otherwise synchronized instructions. const fn core::arch::volatile_non_tearing::<T>() -> bool
returns true
if T
is such that tearing cannot occur for naturally-aligned accesses. It may still occur for non-aligned accesses (see below).
Unaligned volatile access
The compiler is not allowed to assume that the arguments of core::{ptr::{read_volatile, write_volatile}, arch::{load, store}}
are aligned. However, it is also not required to generate code to handle unaligned access, if doing so would cause a performance penalty for the aligned case. In particular, whether the no-tearing guarantee applies to unaligned access is architecture dependent. On some architectures, it is even possible for unaligned access to cause a hardware trap.
New lints
Use of core::ptr::{read_volatile, write_volatile}
with a type that cannot be safely transmuted to and from a byte slice will trigger a dubious_type_in_volatile
lint. Use of core::arch::{load, store}
with such types will trigger a dubious_type_in_load_or_store
lint. Both are Warn
by default. Thanks to @comex for the suggestion!
Lowering
LLVM volatile semantics are still unclear, and may turn out to be weaker than necessary. It is also possible that LLVM volatile requires dereferenceable
or otherwise interacts poorly with some of the permitted corner-cases. Therefore, I recommend lowering core::{arch::{load, store}, ptr::{read_volatile, write_volatile}}
to LLVM inline assembly instead, which is at least guaranteed to work. This may change in the future.
Activity
comex commentedon Mar 16, 2022
I like this, except:
This seems like it should be a trait bound.
This part is a breaking change and doesn't seem well motivated to me.
For writes: Writing padding bits is potentially a security concern due to the potential to leak memory contents, but it doesn't seem inherently unsound; any undefined bits should just be implicitly frozen to an arbitrary value. As for unspecified layout, if by that you mean things like
repr(Rust)
, this layout can still be probed at runtime, or perhaps you don't care about the layout because you only need to read the value back later as the same type in the same program.For reads: Just because volatile is well-suited for dealing with untrusted or potentially-corrupted memory doesn't mean that's the only possible use case. You may happen to know for whatever reason that the load will return a valid value. Perhaps you're reading it from an MMIO register; perhaps you're abusing volatile to implement atomics (bad idea but, in the right circumstances, not unsound); perhaps the load doesn't have to be volatile but is anyway due to some generics madness.
All of these cases seem dubious enough to be worth a lint, but I'm skeptical that they should be hard errors even with the new functions, let alone the existing already-stable functions.
DemiMarie commentedon Mar 16, 2022
Agreed, lint added.
I generally assume that MMIO devices are not automatically trustworthy, but your point stands.
bjorn3 commentedon Mar 16, 2022
That can't be done without forcing deoptimization of any program that may call this. To prevent deoptimization it would be better to say that it can access any memory which an opaque function that gets passed the pointer as argument may access. That would for example not include stack variables which don't have their address taken.
DemiMarie commentedon Mar 16, 2022
Is this also the semantics of using inline assembly? The goal is that volatile operations will always operate on whatever happens to be at that memory address; the compiler can’t just say “I know this volatile load or store will have undefined behavior if X” and optimize accordingly. The situation you are referring to is supposed to be covered by, “At the same time, a call to load or store does not disable any optimizations that a call to an unknown function with the same argument would not also disable. In short: garbage in, garbage out.”
The reason for these seemingly contradictory requirements is that I want volatile memory access to be usable for in-process debuggers and crash dumpers. These programs need to be able to access whatever happens to be at an arbitrary caller-provided memory location and know the compiler will not try to outsmart them. This is also why using these functions to dereference null or dangling pointers is explicitly permitted. Just because you can use these functions to read from a piece of memory does not mean that Rust makes any guarantees whatsoever about what you will find there, or that your attempt won’t cause a hardware fault of some sort. Similarly, just because you can use these functions to write to a piece of memory does not mean that Rust makes any guarantees as to what impact that will have on other code. If you misuse them and something breaks, you get to keep both pieces.
DemiMarie commentedon Mar 16, 2022
@bjorn3: do you have suggestions for better phrasing here? The intent is that you can use volatile memory access to peek and poke at whatever memory you want, but the consequences of doing so are entirely your responsibility. For instance, I might need to test that my program’s crash handler triggers successfully when I dereference a null pointer, or that a hardened memory allocator detects modification of freed memory and properly aborts.
bjorn3 commentedon Mar 16, 2022
I believe so.
It has to for any optimization to be possible.
Didn't see that sentence. I agree that covers my situation.
Those things are UB either way. Individual compilers just do a best effort at trying to make them work the way a user expects them to work when optimizations are disabled. When optimizations are enabled it is even more in a best effort basis. For example it may not be possible to change function parameters if the compiler found that a function argument is constant and optimized accordingly.
DemiMarie commentedon Mar 16, 2022
Good to know!
To elaborate, what I am not okay with is the compiler optimizing out the entire basic block as unreachable code. Compilers have a nasty habit of doing that, so I wanted to be absolutely clear that is not permitted.
Thank you.
That behavior is perfectly acceptable (though ideally it would be reflected in the debug info). I wonder if our definitions of UB are slightly different. To mean, an execution with UB has no meaning at all, and the compiler is allowed to prune any basic block that invokes UB.
core::arch::{load, store}
never invoke this form of UB themselves, but areunsafe
because they can cause a crash or even invoke UB in unrelated code.Diggsey commentedon Mar 16, 2022
If the code is unreachable, then of course the compiler is permitted to optimize it away. The compiler is allowed to assume that no UB occurs within the program when determining what is reachable (and for all other optimizations). This is true for all code: there's no special case for atomics.
DemiMarie commentedon Mar 16, 2022
According to the model mentioned above (a volatile memory access is an opaque function call or inline assembler), the compiler does not actually know that
std::ptr::read_volatile(x)
actually dereferences x, merely that it may dereference x. So it cannot prove thatstd::ptr::read_volatile(x)
is UB no matter what x is.Diggsey commentedon Mar 16, 2022
Right, but whether or not
x
is dereferenceable is only one potential source of UB. The compiler could still decide that the call is unreachable for other reasons (sayx
is the result of an expression that invokes UB).I'd also question whether it makes sense to be quite so lenient with
x
- it definitely makes sense thatx
might not be dereferenceable, but what if it'sundef
orpoison
? I don't really have the expertise to answer that though...DemiMarie commentedon Mar 16, 2022
That’s fine.
I decided to err on the side of the simplest possible semantics.
RalfJung commentedon Mar 18, 2022
Thanks for writing this up!
I was confused what the difference to
volatile
would be. Is it fair to say that this is morally what volatile "ought to be", and the only reason you are avoiding the termvolatile
is that volatile has bad interactions with atomic accesses in C/C++/LLVM?This sounds contradictory: a byte array (
[u8; N]
) cannot hold all data (e.g., uninit memory), so an always-valid type cannot be safely cast to a byte array."loads via store" seems odd; I assume "loads and stores" is what you meant?
I think this is too weak. In particular, I think the compiler should be able to assume that the given address is the only memory (that the compiler knows about) that this operation can access, and that it will not have other side-effects (like synchronization) either. So for example if
arch::load
is called on some*const i32
that does not alias some other pointerp
, the compiler is allowed to reorder regular stores top
with those non-aliasingload
s.This also makes these operations quite different from inline assembly, which is allowed to do much more. With inline assembly, the intention is that even swapping out the assembly block by some other code at runtime (that still adheres to the same clobber annotations) is allowed. I don't think we want that for
arch::{load, store}
.What is an "interlocked" load/store?
I am very strongly opposed to using
usize
to indicate memory accesses. In particular, the aliasing rules still matter (as you say), and hence provenance still matters. That means these functions should work on pointers, not integers."Otherwise" here sounds like "in case T cannot be safely transmuted as described", but I doubt that is what you mean.
"non-atomic" is a bad choice here, since assembly languages don't even distinguish atomic and non-atomic accesses -- that is a surface language thing. And the entire point of your proposal is that these accesses are atomic, in the sense of not introducing data races.
But this reminds me, the interactions with the concurrency model need to be specified better I think. If there are overlapping concurrent non-atomic writes to the same location, that is UB. If there are non-atomic reads concurrent to an overlapping
arch::store
, that is UB. And if there are concurrent overlapping atomic accesses, then what are the possible observed values on read accesses? If my hardware has strong consistency guarantees, like x86, can I use this to do the equivalent of an "acquire read"? That would be in contradiction with some of the reorderings I mentioned above.What is wrong with using LLVM volatile atomic accesses?
Lokathor commentedon Mar 18, 2022
Ralf I'm 99% certain that "always valid" is meant to be read as "all initialized bit patterns are valid", not that it's also allowing uninit memory.
RalfJung commentedon Mar 18, 2022
That still leaves provenance as a concern -- transmuting pointers with provenance to integers is subtle at best, making (arrays of) integer types like
u8
IMO unsuited as "generic data containers".120 remaining items