Description
Using dlopen
is a subtle art. On top of the usual requirements around symbol conflicts and ABI compatibility, Rust's handling of symbols adds certain extra assumptions that can lead to UB here: ideally, we'd make sure that symbols from "different" crates can never clash. During normal builds, this is ensured by checking that the StableCrateId
is globally unique (and hashing everything into the StableCrateId
that is considered as relevant for crate identity), but this check is bypassed by dlopen
.
At the very least, this potential risk of collisions in dlopen
seems worth documenting somewhere. On top of that, is there anything we could do to mitigate this problem? Making StableCrateId
an actual cryptographic hash and 256 bits large is probably going to be prohibitively expensive, but maybe there is an alternative where only dlopen
users have to pay for extra checks, and if you don't use dlopen
it doesn't cost anything. One could imagine a rust_checked_dlopen
or so that performs the crate ID uniqueness check at runtime, somehow. Is that realistic? Is it useful?
Activity
VorpalBlade commentedon Aug 13, 2024
What exactly are we trying to protect against?
Let me play devil's advocate here:
Due to lack of stable ABI you will most probably be using C ABI anyway, and no name mangling. You might be using stabby or similar (which builds on top of the C ABI), but arguably they are off doing their own thing.
dlsym
is very basic and even in C++ that has a stable ABI it doesn't work well with C++ name mangling, you are generally working with extern C functions across dlopen/dlsym. You might have an extern C function that returns a more complex object full of C++ types and pointers but that can have a lot of footguns (same version of all involved types must be used and since there is no name mangling you can't detect this anyway).So, assuming extern C API what can we even protect against? C ABI is fundamentally not safe due to lack of name mangling.
Is that not the usage scenario then what is? Both possible alternatives (stabby and abi_stable) already solve the safety concerns at a higher level. Is the list of things that such a layer needs to deal with what you want to end up with in this issue?
It would probably help to come up with some use cases, describing what could go wrong in order to figure this out. As it is, this issue seems broad and vague, or perhaps I'm misunderstanding it.
RalfJung commentedon Aug 13, 2024
I'm not sure what the usual usecases here are.^^ If people only ever
dlopen
things that have a C ABI and nothing else (no Rust symbols exported), then indeed collisions in Rust's name mangling are entirely irrelevant. But is that really the only thing people do?Cc @bjorn3
bjorn3 commentedon Aug 13, 2024
Rustc dlopens codegen backends and uses the rust abi for this. The fact that rust has an unstable abi doesn't matter when you ensure that you use the same rustc version to compile the host and the plugin. For codegen backends using something like stabby or abi_stable is impractical as codegen backends are expected to use the exact same api's as rustc uses internally. Conversion of values at the abi boundary would result in unacceptable overhead.
VorpalBlade commentedon Aug 13, 2024
How does this deal with name mangling and dlsym though? Does it still use no_mangle or does it compute the expected mangled names and pass those to dlsym?
bjorn3 commentedon Aug 13, 2024
For the functions in the plugin to be called by the host
#[no_mangle]
is used. For functions that the plugin calls, those are defined in a dylib which both the plugin and host use as regular rust dependency, ensuring that rustc correctly handles symbol mangling.VorpalBlade commentedon Aug 13, 2024
What bjorn3 said makes sense to me, when doing dlopen you need to use no_mangle. And
ld.so
takes care of resolving symbols called by the plugin. I guess there could be some possible issues there? As I understand it this is:What are the ways this can fail in?
Additional note: The Windows/Mac equivalents to dlopen may also have special consideration. I know that symbol resolution works differently for those (not a single global namespace) but I'm not an expert by any means, especially on those platforms.
Not sure how any of this could affect the opsem angle, and if people who don't care about portability will want to make use of the semantics of their platform of choice. I'm not entirely sure what the opsem angle on this even is, how does the AM represent dlopen/LoadLibrary even?
RalfJung commentedon Aug 13, 2024
The concern is if dylib C depends on crate E, but E happens to have the same StableCrateId as B. Then the symbols of the two crates will get mixed up and everything explodes, even though it doesn't look like the
dlopen
is doing anything wrong.VorpalBlade commentedon Aug 13, 2024
@RalfJung from a pragmatic point of view two questions come to mind:
chorman0773 commentedon Aug 13, 2024
I don't know that we can really express these soundness requirements in any tangible manner. It's like saying that you must not use
#[export_name]
to collide with a symbol name (especially since the language doesn't even guarantee the form of those symbols.RalfJung commentedon Aug 13, 2024
StableCrateId
is a 32bit hash (AFAIK), so (according to this) with 2900 crates in the overall dependency tree there is a 0.1% chance of collision if the hashes are assumed to be fully random.See rust-lang/rust#10389 and rust-lang/rust#129030 for more of these discussions. In this issue, I am interested in exploring what could be done to fix this, not in discussing threat models. (This doesn't mean I think we must fix this, I just want to know what the options would be.)
bjorn3 commentedon Aug 13, 2024
For symbol name collisions I believe you have to collide both the
StableCrateId
and crate name (assuming the v0 symbol mangling scheme). You are unlikely to use 2900 crates with the same name. If you have 3 crates with the same name the chance of a collision is only 10^-9 and at 93 crates with the same name you get a collision chance of 10^-6.digama0 commentedon Aug 13, 2024
Also re: "what's the opsem angle", I think the title question is clear enough: What are the requirements for a programmer to be able to call
dlopen
without causing UB? This requires understanding (1) what are the bad situations that can arise that we would like to classify as UB, and (2) what are the things that the programmer or user did that can lead to the bad situation, which is what we want to put on the warning label.I think a threat model only comes up when it comes to prioritizing the safety requirements in (2) for human consumption, but abstractly it should be possible to come up with an objective answer to the question.
My knowledge of dynamic linking protocols is pretty low so I can't answer the question itself, though. Brainstorming some things based on what has been brought up:
#[no_mangle]
collisions are clearly on the programmer, that's why it's unsafe, but there are reasons it might not be obvious or it may be a distributed responsibility bug. I'd really wish we could catch these issues with a nice error message.#[no_mangle]
local function and an internally defined C function in the dlopen'd library?VorpalBlade commentedon Aug 13, 2024
Thank you, we now have a concrete issue that can lead to unsoundness, which is much easier to dicuss than the general issues with dlopen (which obviously have more, such as general no_mangle collisions etc).
Some thoughts:
One thing that comes to mind is that dlopen has flags that affect name resolution of later dlopen as well. In particular
RTLD_GLOBAL
andRTLD_DEEPBIND
might have interesting interactions. In any case the flags mean there isn't just one single behaviour.Does eager binding (RTLD_NOW and the corresponding ld flags) help at all? I think the newly loaded library will still get messed up in case of a collision, but existing code will be unaffected. So not good enough.
On glibc it looks like
dlmopen
would actually avoid the issue you describe though by putting everything into a separate namespace! Not portable though. Also only 16 separate namespaces are supported apparently. So not great.VorpalBlade commentedon Aug 13, 2024
28 remaining items