Skip to content

Reloading a dylib stops working when moving from 1.19.0 to 1.20.0 #44365

Closed
@Ryan1729

Description

@Ryan1729
Contributor

I have been writing programs which check if a dynamic library's modification time has changed and reload it if so. This allows me to interact with the program, producing transient state but then make changes to the dynamic libraries code and load it to see the result of the changes on the transient state without closing the program.

This feature of my programs has stopped working since I updated to 1.20.0. If I switch back to 1.19.0 with rustup default 1.19.0 then it works again. Switching back with rustup default stable breaks it again.

I've made a small example program that does the lib reloading which you can see here. On 1.19.0 if I launch the program and change, say, this line to something like state.counter += 1000; and rebuild the dynamic library, (or just the whole crate,) the running version of the program will eventually notice the different modification time and start incrementing the counter by 1000 instead of 1. On 1.20.0 this no longer works. On this branch in particular you can see that the new modification time is noticed and the library reloading code is run but it doesn't do anything.

Looking through the patch notes for 1.20.0 I noticed that ManuallyDrop had been stabilized. So I tried using it in this branch but it didn't cause any (noticeable) change in behaviour, (besides compile errors on 1.19.0 of course.)

Activity

hanna-kruppe

hanna-kruppe commented on Sep 6, 2017

@hanna-kruppe
Contributor

Dylibs, like rlibs, have a completely unstable ABI. Among countless other problems, the compiler is free to inline code from a dylib into crates using the dylib, and will in fact do this for #[inline] and generic functions -- which of course breaks hot-reloading. (I'm not saying this is what happened here, that seems a bit unlikely given that you use libloading, it's just one example of how dylib is really more like an rlib than a traditional shared library.)

A reliable way to do this (insofar any hot-reloading can be reliable) would be using the cdylib crate type and interact with with the shared object through a C interface with a stable ABI. That is, declare extern fns and repr(C) structs in the cdylib, and bind to those as you would to a C library (e.g., get function pointers with an extern ABI from libloading).

(You can share type definitions if you structure your code accordingly, and the function bindings might be easier to generate automatically than with C.)

Ryan1729

Ryan1729 commented on Sep 7, 2017

@Ryan1729
ContributorAuthor

@rkruppe I believe I've followed your suggestion in this branch, (diff here), please let me know if I missed anything!

The code in that branch doesn't reload the library on 1.20.0 or 1.190. Oddly, if I don't change state_manipulation to a cdylib then it works on 1.19.0 but not on 1.20.0, like before. Here is a branch with that change.

In both of those branches, the code compiles and runs on 1.19.0 and 1.20.0 just fine, the reloading just doesn't work except in the 1.19.0 plain dylib case.

hanna-kruppe

hanna-kruppe commented on Sep 7, 2017

@hanna-kruppe
Contributor

One detail that's missing in your code is that you still get a fn() -> State instead of a extern "C" fn() -> State function pointer (and analogously for the other function). Unlikely to cause issues with these extremely simple signatures, but the calling convention is different.

Reading over the diff, I wonder whether LIB_PATH is still accurate? I believe cdylib artifacts are named and placed differently from dylib artifacts. You could try this by doing cargo clean in the main crate, ensuring that any dylib artifacts are gone.

Edit: Indeed something's fishy with the way you use the cdylib. I'm surprised that having a dependency on a cdylib crate works at all! A cdylib is more like a staticlib or executable in that it's a finished artifact for consumption by the outside world, rather than something specifically for other Rust crates to consume.

Ryan1729

Ryan1729 commented on Sep 7, 2017

@Ryan1729
ContributorAuthor

I ran cargo clean and then ran the version of the program that uses the C ABI again, and it behaved the same as before, that is, still not reloading the lib, but as expected otherwise. It still creates libstate_manipulation.so in /target/debug/, along with another copy in /target/debug/deps, whether I set state_manipulation to be a cdylib or a plain dylib.

When you talk about "having a dependency on a cdylib crate" are you specifically referring to these lines? This might be clear already, but my intention is to do this live reloading stuff only in debug mode and to have state_manipulation be a normal crate in release mode. In order to do this I need to comment out the cylib/dylib line in state_manipulation's Cargo.toml before building in release mode. I build in release mode infrequently enough that this is no big deal.

I had so far in this issue only been running the code in debug mode, but I tried it in release mode and after fixing update_and_render's return type, (left over from cutting down the previous version to make the example,) it worked just as expected. And of course in release mode it does not create libstate_manipulation.so, just libstate_manipulation.rlib.

hanna-kruppe

hanna-kruppe commented on Sep 7, 2017

@hanna-kruppe
Contributor

When you talk about "having a dependency on a cdylib crate" are you specifically referring to these lines?

I am referring to the fact that state_manipulation only declares the cdylib crate-type in its Cargo.toml, not lib. I would not expect an rlib to be generated for that. Indeed I have just cloned the repository (commit Ryan1729/tiny-live-code-example@d8ef24c) and I get "can't find crate for state_manipulation" in release mode. Please double-check that the code you compile exactly matches what's in the repository, and that you don't have any stale artifacts lying around.

(I'm booted into Windows at the moment so I can't try to reproduce the actual error (reloading failing to have an effect) in debug mode. The fact that state_manipulation does not generate an rlib should be platform-independent though.)

Ryan1729

Ryan1729 commented on Sep 8, 2017

@Ryan1729
ContributorAuthor

can't find crate for state_manipulation

That's the error I get if I don't comment out the crate-type = ["cdylib"] line. As you expected, if I don't do that then I don't get an rlib.

The version of the C_ABI branch I was compiling from is even with the github version. If I download a zip of that code and run it from a different folder it has the same behaviour, (reload only works in 1.19.0 with state_manipulation set as a dylib, in order to compile in release mode you need to make state_manipulation a plain crate, etc.)

I think these debug mode vs release mode differences might be making things more complicated than necessary. So I made another branch from the C_ABI branch with those differences removed, (besides the necessary changes to LIB_PATH). The reloading now works on both (1.19.0,dylib,debug) and (1.19.0,dylib,release) so it seems like that's one less thing to think about.


Just in case it might be relevant, have you noticed these lines in the root Cargo.toml?

#this is needed so the state manipulation dynamic lib is put in the right place
[workspace]

I haven't used workspaces besides this so I'm not sure of the full effects of that [workspace] line, but apparently I added it because it affects where the .so files end up.

Ryan1729

Ryan1729 commented on Sep 8, 2017

@Ryan1729
ContributorAuthor

I decided to see if I could get reloading working at all on windows. If I try to just build while the program is running I get an error saying the build process cannot remove the .dll files. Per this stackexchange answer you can get around this by renaming the .dll then copying the replacement to the original name. So I came up with the following procedure:

  • set the counter increment to 1000.
  • build the program
  • copy the library file, appending 1000 to its name.
  • set the counter increment to 1.
  • run the program. The remaining steps are done without stopping the program
  • rename the freshly built library file, say by adding an underscore.
  • rename the older library file, removing the 1000
  • set that library file's modification time to the current time [1]
  • hopefully observe that the program has now begun incrementing by 1000

I tried that procedure on windows, on a fresh download of the simplified C_ABI branch, (note, without changing cdylib to dylib,) using both 1.19 and 1.20.0. In both cases, it worked; the increment about became 1000.

Performing the same procedure on Linux, (using touch ./target/debug/libstate_manipulation.so instead of the Powershell) only worked if I was using 1.19.0 and I made state_manipulation a plain dylib. Otherwise the new modification time was noticed but the increment amount did not change.

[1]
on windows you can use the following in Powershell :

$file = Get-Item .\target\debug\libstate_manipulation.dll
$file.LastWriteTime = (Get-Date)
hanna-kruppe

hanna-kruppe commented on Sep 8, 2017

@hanna-kruppe
Contributor

I investigated some more (on a linux machine). I can reproduce the behavior you describe, but haven't been able to identify the cause. I'm at my wit's end: with both dylib and cdylib, the right .so file changes and contains the code I'd expect, and the file change is correctly noticed and the library is reloaded, yet results differ. Maybe someone else can diagnose this?

Ryan1729

Ryan1729 commented on Sep 11, 2017

@Ryan1729
ContributorAuthor

I searched back through the nightlies and found that if I set the Rust version with rustup default nightly-2017-07-06 the reloading works, but if I set it with rustup default nightly-2017-07-07 it doesn't work.

This works using using this branch where state_manipulation is set to be a plain dylib, on Linux.

Ryan1729

Ryan1729 commented on Sep 12, 2017

@Ryan1729
ContributorAuthor

Based on the commit hashes that are printed when I run rustup default nightly-2017-07-06 and rustup default nightly-2017-07-07 it seems like the bug would have to have been introduced among these commits. Based on the commit messages alone I don't see an obvious place to look for the bug. Maybe someone else would?

Ryan1729

Ryan1729 commented on Sep 13, 2017

@Ryan1729
ContributorAuthor

This issue is currently on the 4th page of the issues list, so I doubt someone who could quickly make further progress will just find it.

@rkruppe You probably know better than I, who would be most able to help. Can you ask someone else to have a look at this bug? @ mentioning everyone involved in that list of commits seems like overkill.

hanna-kruppe

hanna-kruppe commented on Sep 13, 2017

@hanna-kruppe
Contributor

#42899 and #42727 seems like the most likely candidates, though only because they're the only things in the list that look like they touch linking at all. So: cc @alexcrichton

alexcrichton

alexcrichton commented on Sep 14, 2017

@alexcrichton
Member

Hm there's a lot of discussion here and anything dealing with loading/unloading dylibs is absolutely fraught with unsafety, is there a distillation of the problem at this point?

hanna-kruppe

hanna-kruppe commented on Sep 14, 2017

@hanna-kruppe
Contributor

No idea what the cause might be, but there is a reasonably small reproduction: https://github.com/Ryan1729/tiny-live-code-example/tree/3636b8e29af3e141c12f4e00514b34840f99c12e

The problem that occurs on Linux with state_manipulation having crate type cdylib (and also with dylib on newer versions, but I consider that the less interesting part) is reproduced like this:

  • cargo run [--release], keep it running
  • change the amount by which self.counter is incremented in state_manipulation/src/lib.rs
  • cargo build [--release] in state_manipulation/
  • the .so is updated (and contains the correct code) and the running process notices that and reloads, but the state changes in said process are not affected

@Ryan1729 I have noticed that this reproduction could be even smaller, though. For example, you could remove Platform and State and just pass a &mut i64, and consequently also simplify the the code in main.rs. Can you try reducing it further?

Ryan1729

Ryan1729 commented on Sep 14, 2017

@Ryan1729
ContributorAuthor

I've removed Platform and State here: https://github.com/Ryan1729/tiny-live-code-example/tree/f7b1b07801866db6de3a525f6ec2ebb1cac05549
This reduction removed the need for the common crate altogether.
The behaviour is still the same.

31 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-FFIArea: Foreign function interface (FFI)A-linkageArea: linking into static, shared libraries and binariesC-bugCategory: This is a bug.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @comex@alexcrichton@jethrogb@TimNN@porglezomp

        Issue actions

          Reloading a dylib stops working when moving from 1.19.0 to 1.20.0 · Issue #44365 · rust-lang/rust