Closed
Description
I have a few questions regarding usize
and raw pointers for sized T
:
- Is
usize
guaranteed to have the same layout as one of theuN
? - Is
usize
guaranteed to have no padding bits? - Is
usize as *mut T as usize
the identity function onusize
? - Is
usize as *mut T
guaranteed to be the same astransmute::<usize, *mut T>
? - Is
ptr::read_unaligned::<*mut T>
safe for all arguments for whichptr::read_unaligned::<[u8; sizeof(*mut T)]>
is safe? - Is
usize-literal as *mut T
guaranteed to have no special behavior? (e.g. for0usize
)
The current documentation
- https://rust-lang.github.io/unsafe-code-guidelines/layout/scalars.html
- https://rust-lang.github.io/unsafe-code-guidelines/layout/pointers.html
does not answer these questions afaict.
Activity
RalfJung commentedon Oct 11, 2020
Yes, all our integer types are.
These are contingent on details of pointer provenance and how it interacts with integer-pointer cats... so the answer is "we don't know". FWIW, that is also the answer for LLVM IR, and likewise in C/C++. (See this for some of the recent work on the C/C++ side, but I hope we can find something cleaner for Rust.)
The opposite direction,
*mut T as usize
, will most likely not be equivalent to a transmute -- in fact so far there is no good proposal on the table for how to make the transmute not UB, which is rather unsatisfying. (This paper is one of the most recent proposals.)No, certainly not. For example, if
T := bool
, then the former is UB is the value you are loading is not a validbool
.Yes.
mahkoh commentedon Oct 11, 2020
Thanks for the links I'll look at them later.
ptr::read_unaligned::<*mut T>
produces a*mut T
not aT
.IIRC, pointers actually have trap representations on certain architectures in C. Since raw pointers in Rust are somewhat relaxed compared to C pointers, can we guarantee that any bit pattern is a valid raw pointer in Rust?
RalfJung commentedon Oct 11, 2020
Oh sorry, I misread. So regarding your question then, I think the answer is yes -- everything that's a valid integer will also be a valid pointer. The other way around might not be true though, that is again about pointer provenance.
But really the proper answer is that Rust doesn't have a spec yet that can answer such detailed questions, sorry. :/
No -- see #71 for the related discussions on validity of integers. ("Trap representations" are not a thing in Rust, but we have a related concept of "invalid values".) If uninitialized integers are invalid, then uninitialized raw pointers will also be invalid.
digama0 commentedon Oct 11, 2020
Forgive my naivete, but what are the blockers behind both directions being trivial? If raw pointers are just treated as integers, then these casts become trivial to specify, and all the interesting work goes into
&mut T -> *mut T
and*mut T -> &mut T
instead.RalfJung commentedon Oct 11, 2020
Pointers in LLVM are not integers, so we cannot make them integers in Rust either. Potentially we could compile raw pointers differently, not using LLVM pointers, but I am not sure how well that works (and LLVM semantics in this area are so unclear, not to say buggy, that there is no telling if that would actually help).
RalfJung commentedon Oct 11, 2020
Oh also, Rust functions like
offset
andwrapping_offset
clearly show that the intent in Rust is for raw pointers to carry provenance. The goal, as far as I understand it, is that raw pointers in Rust are like pointers in C -- and those are complicated.I like the idea of only references having provenance, but I think it is unfortunately unrealistic. It would be good to have more data on this though, like numbers for what the perf impact would be if we used LLVM integers to compile Rust raw pointers. Also see these provenance-related discussions in the UCG.
digama0 commentedon Oct 11, 2020
I will grant that pointers are complicated, but it seems that pointer provenance already gets wrapped up in discussions about "what's in a byte" even when only talking about integer types. So it seems like we can just make raw pointers exactly as "provenant" as
usize
and then the cast is again trivial, notwithstanding the unsolved problem of exactly how much provenance to carry around through both raw pointers and integers.RalfJung commentedon Oct 11, 2020
I am working under the assumption that
usize
does not have provenance. This assumption is deeply wired into many optimizations, e.g. when usingif x == y { ...
as indicator that we can replacex
withy
inside the conditional. (Even if they are equal integers, their provenance might differ.) Moreover, you lose some arithmetic properties, depending on how exactly provenance propagates through arithmetic -- you lose associativity and/or commutativity,x * 0
is not the same as0
, ...The big advantage of pointers is that their only "arithmetic" is "add an integer", so it is easy to say what happens to provenance: the result has the same provenance as the left input (the only pointer input). But provenance needs to end when pointers are cast to integers, and something needs to be done about pointers being transmuted to integers. The easiest option would be to declare it UB, and C's strict aliasing even almost does that. It could also behave like uninitialized integers, but it is unclear if that is any better.
This C++ proposal has more detail; and here is even more material.
Quite a few people have thought about these problems for years; there does not seem to be an easy solution that we can "just" use. ;)
digama0 commentedon Oct 11, 2020
What is the reason that all this mess has to be in the spec instead of just in the compiler though? SB already gives a semantics that tracks where you are or are not allowed to write, and once you are in raw pointer land most of the value tracking turns off. If the compiler can infer some provenance information, great, but the spec doesn't need to play that game.
Put another way, what is a program that should be UB that SB says is fine, and relies on tracking pointer provenance through raw pointers and/or integers?
Isn't there an LLVM bug (that you filed, I think) that does this with pointers?
sollyucko commentedon Oct 11, 2020
Cases where this might not be true include:
digama0 commentedon Oct 11, 2020
I would hope that we can at least ensure that it has the same layout pattern as
uN
whereN
is the number of bits in the word; that is, if the word size is 24 bits thenusize = u24
, if such a thing existed. I suppose the only non obvious property is alignment, but that probably has to be specified by the target architecture in any case.RalfJung commentedon Oct 11, 2020
Because so far nobody has been able to propose a semantics that can hide this from the spec, but still do all the desired optimizations and at the same time provide pointer-integer casts.
It does, though. Stacked Borrows relies on this provenance information, and that effect cannot be entirely hidden from the semantics in a language with pointer-integer casts.
Stacked Borrows "gives up" on raw pointers (treating them as having much less provenance -- but there is still some provenance left, namely to track the allocation ID that the pointer originally pointed to), but that is intended as a temporary stepping-stone. #86 contains some examples of optimizations that we want to do, and that LLVM does, but that SB currently fails to support because it forgets provenance for raw pointers.
I already mentioned
offset
as a function whose UB can only be modeled if raw pointers have provenance.Do you mean this one? Yes, LLVM does this with pointers and that's wrong. The goal is to make it wrong for pointers but right for integers.
mahkoh commentedon Oct 13, 2020
@RalfJung
I do not believe so. I've taken a quick look at annex J (unspecified/undefined/implementation-defined behavior) of C11:
👍 (wrapping_add, if the pointer is valid)❌People who want to write unsafe code often consult the C standard to see what works and what doesn't work. I believe it would be good to have an informal document that goes through the C standard and compares C semantics to the current Rust semantics.
Besides the differences regarding pointers, another example are unions which do not have an active field in Rust and can therefore be used to implement transmute.
PS: I haven't yet have time to look at your links.
RalfJung commentedon Oct 13, 2020
wrapping_add
is just an operation that Rust has that C could have but doesn't -- we also haveadd
/offset
corresponding to the usual C-style rules for pointer arithmetic. But I take your other points. Rust raw pointers are more relaxed than C's.The fact remains that both
offset
andwrapping_offset
just make no sense for pointers without provenance. The only reason these operations exist is because they preserve provenance, and the people introducing them considered it important that LLVM has that information available for its optimizations. That is why I am working under the assumption that raw pointers should have provenance.Note that raw pointers not having provenance does not really solve any of the hard problems, it just moves them around. Everything that is currently tricky about casting and transmuting between raw pointers and integers, then becomes tricky about casting and transmuting between raw pointers and references. We have to figure out answers to these questions anyway, we have to find some good way to handle the "boundary" between what has provenance and what does not. Where we put that boundary is mostly orthogonal.
25 remaining items