Skip to content

## Pre-Pre-RFC: core::arch::{load, store} and stricter volatile semantics #321

Open
@DemiMarie

Description

@DemiMarie

Update (2025-05-20): see #321 (comment)


core::arch::{load, store}

This proposes new load and store functions (in core::arch), for raw hardware loads and stores, and a concept of an always-valid type that can safely be cast to a byte array. It also defines volatile accesses in terms of these functions.

The functions proposed here have the same semantics as raw machine load and store instructions. The compiler is not permitted to assume that the values loaded or stored are initialized, or even that they point to valid memory. However, it is permitted to assume that load and store do not violate Rust’s mutability rules.

In particular, it is valid to use these functions to manipulate memory that is being concurrently accessed or modified by any means whatsoever. Therefore, they can be used to access memory that is shared with untrusted code. For example, a kernel could use them to access userspace memory, and a user-mode server could use them to access memory shared with a less-privileged user-mode process. It is also safe to use these functions to manipulate memory that is being concurrently accessed via DMA, or that corresponds to a memory-mapped hardware register.

The core guarantee that makes load and store useful is this: A call to load or store is guaranteed to result in exactly one non-tearing non-interlocked load from or store to the exact address passed to the function, no matter what that address happens to be. To ensure this, load and store are considered partially opaque to the optimizer. The optimizer must consider them to be calls to functions that may or may not dereference their arguments. It is even possible that the operation triggers a hardware fault that some other code catches and recovers from. Hence, the compiler can never prove that a given call to core::arch::load and core::arch::store will have undefined behavior. In other ways, a call to load or store does not disable any optimizations that a call to an unknown function with the same argument would not also disable. In short: garbage in, garbage out.

The actual functions are as follows:

unsafe fn load<T>(ptr: *const T) -> T;
unsafe fn store<T>(ptr: *mut T, arg: T);

Performs a single memory access (of size size_of::<T>()) on ptr. The compiler must compile each these function calls into exactly one machine instruction. If this is not possible, it is a compile-time error. The types T for which a compiler can successfully generate code for these calls is dependent on the target architecture. Using a T that cannot safely be transmuted to or from a byte array is not forbidden, but is often erroneous, and thus triggers a lint (see below). Provided that ptr is properly aligned, these functions are guaranteed to not cause tearing. If ptr is not properly aligned, the results are architecture-dependent.

The optimizer is not permitted to assume that ptr is dereferenceable or that it is properly aligned. This allows these functions to be used for in-process debuggers, crash dumpers, and other applications that may need to access memory at addresses obtained from some external source, such as a debug console or /proc/self/maps. If load is used to violate the aliasing rules (by accessing memory the compiler thinks cannot be accessed), the value returned may be non-deterministic and may contain sensitive data. If store is used to overwrite memory the compiler can assume will not be modified, subsequent execution (after the call to store returns) has undefined behavior.

The semantics of volatile

A call to ptr::read_volatile desugars to one or more calls to load, and a call to ptr::write_volatile desugars to one or more calls to store. The compiler is required to minimize tearing to the extent possible, provided that doing so does not require the use of interlocked or otherwise synchronized instructions. const fn core::arch::volatile_non_tearing::<T>() -> bool returns true if T is such that tearing cannot occur for naturally-aligned accesses. It may still occur for non-aligned accesses (see below).

Unaligned volatile access

The compiler is not allowed to assume that the arguments of core::{ptr::{read_volatile, write_volatile}, arch::{load, store}} are aligned. However, it is also not required to generate code to handle unaligned access, if doing so would cause a performance penalty for the aligned case. In particular, whether the no-tearing guarantee applies to unaligned access is architecture dependent. On some architectures, it is even possible for unaligned access to cause a hardware trap.

New lints

Use of core::ptr::{read_volatile, write_volatile} with a type that cannot be safely transmuted to and from a byte slice will trigger a dubious_type_in_volatile lint. Use of core::arch::{load, store} with such types will trigger a dubious_type_in_load_or_store lint. Both are Warn by default. Thanks to @comex for the suggestion!

Lowering

LLVM volatile semantics are still unclear, and may turn out to be weaker than necessary. It is also possible that LLVM volatile requires dereferenceable or otherwise interacts poorly with some of the permitted corner-cases. Therefore, I recommend lowering core::{arch::{load, store}, ptr::{read_volatile, write_volatile}} to LLVM inline assembly instead, which is at least guaranteed to work. This may change in the future.

Activity

comex

comex commented on Mar 16, 2022

@comex

I like this, except:

If such a transmute would be unsafe (in the sense of Project Safe Transmute), it is a compile-time error (regardless of the target platform)

This seems like it should be a trait bound.

write_volatile<T> is a compile-time error if T has any padding bits or an unspecified layout, as it as no useful semantics in that case. Similarly, read_volatile<T> is a compile-time error if T has any invalid representations.

This part is a breaking change and doesn't seem well motivated to me.

For writes: Writing padding bits is potentially a security concern due to the potential to leak memory contents, but it doesn't seem inherently unsound; any undefined bits should just be implicitly frozen to an arbitrary value. As for unspecified layout, if by that you mean things like repr(Rust), this layout can still be probed at runtime, or perhaps you don't care about the layout because you only need to read the value back later as the same type in the same program.

For reads: Just because volatile is well-suited for dealing with untrusted or potentially-corrupted memory doesn't mean that's the only possible use case. You may happen to know for whatever reason that the load will return a valid value. Perhaps you're reading it from an MMIO register; perhaps you're abusing volatile to implement atomics (bad idea but, in the right circumstances, not unsound); perhaps the load doesn't have to be volatile but is anyway due to some generics madness.

All of these cases seem dubious enough to be worth a lint, but I'm skeptical that they should be hard errors even with the new functions, let alone the existing already-stable functions.

DemiMarie

DemiMarie commented on Mar 16, 2022

@DemiMarie
Author

All of these cases seem dubious enough to be worth a lint, but I'm skeptical that they should be hard errors even with the new functions, let alone the existing already-stable functions.

Agreed, lint added.

Perhaps you're reading it from an MMIO register

I generally assume that MMIO devices are not automatically trustworthy, but your point stands.

bjorn3

bjorn3 commented on Mar 16, 2022

@bjorn3
Member

Therefore, the Rust compiler is never allowed to make assumptions about the memory accessed by these functions, or the results of such accesses.

That can't be done without forcing deoptimization of any program that may call this. To prevent deoptimization it would be better to say that it can access any memory which an opaque function that gets passed the pointer as argument may access. That would for example not include stack variables which don't have their address taken.

DemiMarie

DemiMarie commented on Mar 16, 2022

@DemiMarie
Author

Therefore, the Rust compiler is never allowed to make assumptions about the memory accessed by these functions, or the results of such accesses.

That can't be done without forcing deoptimization of any program that may call this. To prevent deoptimization it would be better to say that it can access any memory which an opaque function that gets passed the pointer as argument may access. That would for example not include stack variables which don't have their address taken.

Is this also the semantics of using inline assembly? The goal is that volatile operations will always operate on whatever happens to be at that memory address; the compiler can’t just say “I know this volatile load or store will have undefined behavior if X” and optimize accordingly. The situation you are referring to is supposed to be covered by, “At the same time, a call to load or store does not disable any optimizations that a call to an unknown function with the same argument would not also disable. In short: garbage in, garbage out.”

The reason for these seemingly contradictory requirements is that I want volatile memory access to be usable for in-process debuggers and crash dumpers. These programs need to be able to access whatever happens to be at an arbitrary caller-provided memory location and know the compiler will not try to outsmart them. This is also why using these functions to dereference null or dangling pointers is explicitly permitted. Just because you can use these functions to read from a piece of memory does not mean that Rust makes any guarantees whatsoever about what you will find there, or that your attempt won’t cause a hardware fault of some sort. Similarly, just because you can use these functions to write to a piece of memory does not mean that Rust makes any guarantees as to what impact that will have on other code. If you misuse them and something breaks, you get to keep both pieces.

DemiMarie

DemiMarie commented on Mar 16, 2022

@DemiMarie
Author

@bjorn3: do you have suggestions for better phrasing here? The intent is that you can use volatile memory access to peek and poke at whatever memory you want, but the consequences of doing so are entirely your responsibility. For instance, I might need to test that my program’s crash handler triggers successfully when I dereference a null pointer, or that a hardened memory allocator detects modification of freed memory and properly aborts.

bjorn3

bjorn3 commented on Mar 16, 2022

@bjorn3
Member

Is this also the semantics of using inline assembly?

I believe so.

the compiler can’t just say “I know this volatile load or store will have undefined behavior if X” and optimize accordingly.

It has to for any optimization to be possible.

The situation you are referring to is supposed to be covered by, “At the same time, a call to load or store does not disable any optimizations that a call to an unknown function with the same argument would not also disable. In short: garbage in, garbage out.”

Didn't see that sentence. I agree that covers my situation.

The reason for these seemingly contradictory requirements is that I want volatile memory access to be usable for in-process debuggers and crash dumpers.

Those things are UB either way. Individual compilers just do a best effort at trying to make them work the way a user expects them to work when optimizations are disabled. When optimizations are enabled it is even more in a best effort basis. For example it may not be possible to change function parameters if the compiler found that a function argument is constant and optimized accordingly.

DemiMarie

DemiMarie commented on Mar 16, 2022

@DemiMarie
Author

Is this also the semantics of using inline assembly?

I believe so.

Good to know!

the compiler can’t just say “I know this volatile load or store will have undefined behavior if X” and optimize accordingly.

It has to for any optimization to be possible.

To elaborate, what I am not okay with is the compiler optimizing out the entire basic block as unreachable code. Compilers have a nasty habit of doing that, so I wanted to be absolutely clear that is not permitted.

The situation you are referring to is supposed to be covered by, “At the same time, a call to load or store does not disable any optimizations that a call to an unknown function with the same argument would not also disable. In short: garbage in, garbage out.”

Didn't see that sentence. I agree that covers my situation.

Thank you.

The reason for these seemingly contradictory requirements is that I want volatile memory access to be usable for in-process debuggers and crash dumpers.

Those things are UB either way. Individual compilers just do a best effort at trying to make them work the way a user expects them to work when optimizations are disabled. When optimizations are enabled it is even more in a best effort basis. For example it may not be possible to change function parameters if the compiler found that a function argument is constant and optimized accordingly.

That behavior is perfectly acceptable (though ideally it would be reflected in the debug info). I wonder if our definitions of UB are slightly different. To mean, an execution with UB has no meaning at all, and the compiler is allowed to prune any basic block that invokes UB. core::arch::{load, store} never invoke this form of UB themselves, but are unsafe because they can cause a crash or even invoke UB in unrelated code.

Diggsey

Diggsey commented on Mar 16, 2022

@Diggsey

To elaborate, what I am not okay with is the compiler optimizing out the entire basic block as unreachable code. Compilers have a nasty habit of doing that, so I wanted to be absolutely clear that is not permitted.

If the code is unreachable, then of course the compiler is permitted to optimize it away. The compiler is allowed to assume that no UB occurs within the program when determining what is reachable (and for all other optimizations). This is true for all code: there's no special case for atomics.

DemiMarie

DemiMarie commented on Mar 16, 2022

@DemiMarie
Author

To elaborate, what I am not okay with is the compiler optimizing out the entire basic block as unreachable code. Compilers have a nasty habit of doing that, so I wanted to be absolutely clear that is not permitted.

If the code is unreachable, then of course the compiler is permitted to optimize it away. The compiler is allowed to assume that no UB occurs within the program when determining what is reachable (and for all other optimizations). This is true for all code: there's no special case for atomics.

According to the model mentioned above (a volatile memory access is an opaque function call or inline assembler), the compiler does not actually know that std::ptr::read_volatile(x) actually dereferences x, merely that it may dereference x. So it cannot prove that std::ptr::read_volatile(x) is UB no matter what x is.

Diggsey

Diggsey commented on Mar 16, 2022

@Diggsey

So it cannot prove that std::ptr::read_volatile(x) is UB no matter what x is.

Right, but whether or not x is dereferenceable is only one potential source of UB. The compiler could still decide that the call is unreachable for other reasons (say x is the result of an expression that invokes UB).

I'd also question whether it makes sense to be quite so lenient with x - it definitely makes sense that x might not be dereferenceable, but what if it's undef or poison? I don't really have the expertise to answer that though...

DemiMarie

DemiMarie commented on Mar 16, 2022

@DemiMarie
Author

So it cannot prove that std::ptr::read_volatile(x) is UB no matter what x is.

Right, but whether or not x is dereferenceable is only one potential source of UB. The compiler could still decide that the call is unreachable for other reasons (say x is the result of an expression that invokes UB).

That’s fine.

I'd also question whether it makes sense to be quite so lenient with x - it definitely makes sense that x might not be dereferenceable, but what if it's undef or poison? I don't really have the expertise to answer that though...

I decided to err on the side of the simplest possible semantics.

RalfJung

RalfJung commented on Mar 18, 2022

@RalfJung
Member

Thanks for writing this up!

I was confused what the difference to volatile would be. Is it fair to say that this is morally what volatile "ought to be", and the only reason you are avoiding the term volatile is that volatile has bad interactions with atomic accesses in C/C++/LLVM?

an always-valid type that can safely be cast to a byte array

This sounds contradictory: a byte array ([u8; N]) cannot hold all data (e.g., uninit memory), so an always-valid type cannot be safely cast to a byte array.

The functions proposed here have the same semantics as raw machine load and store instructions. The compiler is not permitted to assume that the values loaded or stored are initialized, or even that they point to valid memory. However, it is permitted to assume that loads via store do not violate Rust’s mutability rules.

"loads via store" seems odd; I assume "loads and stores" is what you meant?

I think this is too weak. In particular, I think the compiler should be able to assume that the given address is the only memory (that the compiler knows about) that this operation can access, and that it will not have other side-effects (like synchronization) either. So for example if arch::load is called on some *const i32 that does not alias some other pointer p, the compiler is allowed to reorder regular stores to p with those non-aliasing loads.

This also makes these operations quite different from inline assembly, which is allowed to do much more. With inline assembly, the intention is that even swapping out the assembly block by some other code at runtime (that still adheres to the same clobber annotations) is allowed. I don't think we want that for arch::{load, store}.

non-tearing non-interlocked load from or store to

What is an "interlocked" load/store?

The actual functions are as follows:

I am very strongly opposed to using usize to indicate memory accesses. In particular, the aliasing rules still matter (as you say), and hence provenance still matters. That means these functions should work on pointers, not integers.

T must be a type such that T can safely be transmuted to (resp. from) [u8; size_of::]. Otherwise, the behavior of this function is the same as that of inline assembly that contains a single non-atomic load (resp. store) instruction of the correct size.

"Otherwise" here sounds like "in case T cannot be safely transmuted as described", but I doubt that is what you mean.

"non-atomic" is a bad choice here, since assembly languages don't even distinguish atomic and non-atomic accesses -- that is a surface language thing. And the entire point of your proposal is that these accesses are atomic, in the sense of not introducing data races.

But this reminds me, the interactions with the concurrency model need to be specified better I think. If there are overlapping concurrent non-atomic writes to the same location, that is UB. If there are non-atomic reads concurrent to an overlapping arch::store, that is UB. And if there are concurrent overlapping atomic accesses, then what are the possible observed values on read accesses? If my hardware has strong consistency guarantees, like x86, can I use this to do the equivalent of an "acquire read"? That would be in contradiction with some of the reorderings I mentioned above.

To avoid problems with LLVM’s unclear volatile semantics, the LLVM backend should in fact lower this function to LLVM inline assembler.

What is wrong with using LLVM volatile atomic accesses?

Lokathor

Lokathor commented on Mar 18, 2022

@Lokathor
Contributor

Ralf I'm 99% certain that "always valid" is meant to be read as "all initialized bit patterns are valid", not that it's also allowing uninit memory.

RalfJung

RalfJung commented on Mar 18, 2022

@RalfJung
Member

Ralf I'm 99% certain that "always valid" is meant to be read as "all initialized bit patterns are valid", not that it's also allowing uninit memory.

That still leaves provenance as a concern -- transmuting pointers with provenance to integers is subtle at best, making (arrays of) integer types like u8 IMO unsuited as "generic data containers".

120 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @comex@Amanieu@RalfJung@Diggsey@hsivonen

        Issue actions

          ## Pre-Pre-RFC: `core::arch::{load, store}` and stricter volatile semantics · Issue #321 · rust-lang/unsafe-code-guidelines