-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental: Add Derive Proc-Macro Caching #129102
base: master
Are you sure you want to change the base?
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…ng, r=<try> Experimental: Add Derive Proc-Macro Caching # On-Disk Caching For Derive Proc-Macro Invocations This PR adds on-disk caching for derive proc-macro invocations using rustc's query system to speed up incremental compilation. The implementation is (intentionally) a bit rough/incomplete, as I wanted to see whether this helps with performance before fully implementing it/RFCing etc. I did some ad-hoc performance testing. ## Rough, Preliminary Eval Results: Using a version built through `DEPLOY=1 src/ci/docker/run.sh dist-x86_64-linux` (which I got from [here](https://rustc-dev-guide.rust-lang.org/building/optimized-build.html#profile-guided-optimization)). ### [Some Small Personal Project](https://github.com/futile/ultra-game): ```console # with -Zthreads=0 as well $ touch src/main.rs && cargo +dist check ``` Caused a re-check of 1 crate (the only one). Result: | Configuration | Time (avg. ~5 runs) | |--------|--------| | Uncached | ~0.54s | | Cached | ~0.54s | No visible difference. ### [Bevy](https://github.com/bevyengine/bevy): ```console $ touch crates/bevy_ecs/src/lib.rs && cargo +dist check ``` Caused a re-check of 29 crates. Result: | Configuration | Time (avg. ~5 runs) | |--------|--------| | Uncached | ~6.4s | | Cached | ~5.3s | Roughly 1s, or ~17% speedup. ### [Polkadot-Sdk](https://github.com/paritytech/polkadot-sdk): Basically this script (not mine): https://github.com/coderemotedotdev/rustc-profiles/blob/d61ad38c496459d82e35d8bdb0a154fbb83de903/scripts/benchmark_incremental_builds_polkadot_sdk.sh TL;DR: Two full `cargo check` runs to fill the incremental caches (for cached & uncached). Then 10 repetitions of `touch $some_file && cargo +uncached check && cargo +cached check`. ```console $ cargo update # `time` didn't build because compiler too new/dep too old $ ./benchmark_incremental_builds_polkadot_sdk.sh # see above ``` _Huge_ workspace with ~190 crates. Not sure how many were re-built/re-checkd on each invocation. Result: | Configuration | Time (avg. 10 runs) | |--------|--------| | Uncached | 99.4s | | Cached | 67.5s | Very visible speedup of 31.9s or ~32%. --- **-> Based on these results I think it makes sense to do a rustc-perf run and see what that reports.** --- ## Current Limitations/TODOs I left some `FIXME(pr-time)`s in the code for things I wanted to bring up/draw attention to in this PR. Usually when I wasn't sure if I found a (good) solution or when I knew that there might be a better way to do something; See the diff for these. ### High-Level Overview of What's Missing For "Real" Usage: * [ ] Add caching for `Bang`- and `Attr`-proc macros (currently only `Derive`). * Not a big change, I just focused on `derive`-proc macros for now, since I felt like these should be most cacheable and are used very often in practice. * [ ] Allow marking specific macros as "do not cache" (currently only all-or-nothing). * Extend the unstable option to support, e.g., `-Z cache-derive-macros=some_pm_crate::some_derive_macro_fn` for easy testing using the nightly compiler. * After Testing: Add a `#[proc_macro_cacheable]` annotation to allow proc-macro authors to "opt-in" to caching (or sth. similar). Would probably need an RFC? * Might make sense to try to combine this with rust-lang#99515, so that external dependencies can be picked up and be taken into account as well. --- So, just since you were in the loop on the attempt to cache declarative macro expansions: r? `@petrochenkov` Please feel free to re-/unassign! Finally: I hope this isn't too big a PR, I'll also show up in Zulip since I read that that is usually appreciated. Thanks a lot for taking a look! :) (Kind of related/very similar approach, old declarative macro caching PR: rust-lang#128747)
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (6d8226b): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)Results (primary -0.3%, secondary -1.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -3.9%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary -0.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 753.65s -> 756.621s (0.39%) |
Ok, I think the performance results look pretty great! Here are the results for InstructionsSummary
Primary benchmarks
Secondary benchmarks
ConclusionSo overall, lots of The secondary benchmark results look weird, because Also, I think the benchmarks don't contain a lot of "big" projects (like bevy etc.), which might see the biggest speedup from this caching. Next StepsGiven these results, what's the opinion here? I would think it makes sense to review + clean up the implementation, extend it to all proc-macro types, and to add an unstable option that allows not caching certain proc-macros (e.g., Then after that, as mentioned before, it might need a new attribute like Anyway, it would be awesome if someone could take a look at/review the diff so far, and to also get some feedback on possible next steps. Thanks a lot already! :) P.S.: Big kudos to this blog post which found these possible gains in the first place! Also to @SparrowLii, whose initial implementation for declarative macro caching provided me the much-needed basic structure for this implementation as well! |
As far as I've seen, the default case has been to assume that caching can happen (and cases such as stateful proc-macros have not been supported), even if some proc-macros haven't treated it this way. Examples issues of this are #44034 (comment) and #63804 (comment), and #44034 (comment) (and the following comments) notes the IDE case, where language servers such as rust-analyzer and RustRover may cache proc-macro expansions already. Not a reference, but another example of how an IDE has treated them https://blog.jetbrains.com/rust/2022/12/05/what-every-rust-developer-should-know-about-macro-support-in-ides/#what-every-rust-macro-implementor-should-take-into-account
I'm not arguing for or against a flag/attribute, nor am I a reviewer, but saw this wonderful PR and thought to add some context I've seen to help make a decision. It's possible there have been further discussions elsewhere that I have missed. Thank you for exploring an implementation of caching proc-macro expansions! |
Thank you for bringing that up, I wasn't aware of it, definitely helps! :) The reason I wanted to not just "break" existing libraries is to not make life unnecessarily harder for libraries like sqlx, which rely on this behavior to a certain point. I think it makes sense to coordinate with #99515, which would give proc macro authors something clear to rely on, and would at the same time help to reduce the "fallout" from starting to cache proc macro invocations. But this probably means that a new opt-in attribute/etc. will not be necessary, so that's really nice, thanks! :)
Thanks, very appreciated! :) |
64058f4
to
abb1ba2
Compare
Updated with some cleanup. Got rid of some unnecessary code, and split out a small change to error handling in the query system into #129271. Otherwise same as before, ready for review :) |
…-panic, r=michaelwoerister Prevent double panic in query system, improve diagnostics I stumbled upon a double-panic in the query system while working on something else (rust-lang#129102), which hid the real error cause for what I was debugging. This PR remedies that, so unwinding should be able to present more errors. It shouldn't really be relevant for code that doesn't ICE.
Rollup merge of rust-lang#129271 - futile:query-system/prevent-double-panic, r=michaelwoerister Prevent double panic in query system, improve diagnostics I stumbled upon a double-panic in the query system while working on something else (rust-lang#129102), which hid the real error cause for what I was debugging. This PR remedies that, so unwinding should be able to present more errors. It shouldn't really be relevant for code that doesn't ICE.
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
tests/incremental/derive_macro_expansion/proc_macro_unchanged.rs
Outdated
Show resolved
Hide resolved
// The proc-macro (for `Nothing`) prints a message to stderr when invoked, | ||
// and this message should only be present during the second invocation | ||
// (which has `cfail2` set via cfg). | ||
// FIXME(pr-time): Properly have the test check this, but how? UI-test that tests for `.stderr`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why do you want to test stderr output, incremental tests have directives for testing invalidation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's what the "Properly have the test check this, but how?" was for, I simply didn't see how I can check this. There are directives, but I didn't figure out a way to make them work for this case. Because the directive needs to be attached to the proc macro output I think, but that was weird somehow. I will try again and see what exactly the problem was with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried again, but couldn't figure out something that works, i.e., that succeeds with proc-macro caching enabled, but fails with proc-macro caching disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that the #[rustc_clean]
attribute needs to be used instead of #[rustc_partition_codegened]
because the incremental system eventually figures out that the second run of derive(Nothing)
effectively produced the same data as the first one, so the codegen doesn't need to happen again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, run-make tests (tests/run-make
) allow to run the compiler twice and look at stderr.
There are even some examples of running the compiler in incremental mode there (search by "incremental").
tests/incremental/derive_macro_expansion/proc_macro_unchanged.rs
Outdated
Show resolved
Hide resolved
@@ -103,6 +104,13 @@ pub use plumbing::{IntoQueryParam, TyCtxtAt, TyCtxtEnsure, TyCtxtEnsureWithValue | |||
// Queries marked with `fatal_cycle` do not need the latter implementation, | |||
// as they will raise an fatal error on query cycles instead. | |||
rustc_queries! { | |||
query derive_macro_expansion(key: (LocalExpnId, Svh, &'tcx TokenStream)) -> Result<&'tcx TokenStream, ()> { | |||
// eval_always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that eval_always
may indeed be needed here because the query may access untracked data, but I'm not sure, this whole PR will need another review from some query system / incremental compilation expert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand correctly, eval_always
would prevent caching completely, no? Or will it always execute the query, but, if the output stays the same, not mark dependent nodes as "require recomputation"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some relevant docs:
- https://rustc-dev-guide.rust-lang.org/queries/incremental-compilation-in-detail.html#query-modifiers
- https://rustc-dev-guide.rust-lang.org/queries/incremental-compilation-in-detail.html?highlight=eval_always#the-projection-query-pattern
Or will it always execute the query, but, if the output stays the same, not mark dependent nodes as "require recomputation"?
This sounds more reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On another hand, with eval_always
the query will never try loading the on disk cache, which is the whole point of this PR.
This will clearly need an opt-in opt out attribute (or better a modifier for |
Thank you for the review! I'm currently a bit busy, but hope I'll get to addressing all the feedback in the code soon! :)
Yep, that was also my initial thought. I guess it might make sense to also ask on, e.g., internals.r-l.o, or on Zulip, to gather opinions on this? But I'll do that when we get that far. Regarding the modifier: I scouted that out at first as well, but I think one or two of the proc-macro types already use modifiers (I'm thinking of
Note that, just as #129102 (comment) mentions, this is already the case. If a proc macro depends on some file/external state, but nothing else about the code has changed, That doesn't mean I'm against being careful about adding caching for proc macros, on the contrary, I think it makes sense to spread information about this proactively before anything is merged. Just that so far these guarantees have strictly not been there either, also rust-analyzer is already not re-executing proc-macros all the time. I think coordinating with #99515 makes a lot of sense, because that would add a way to properly track external dependencies on files and env-vars, which would actually give proc-macro authors a reliable option in general.
Would you have someone in mind? For now I'll fix this round of feedback first, so got a bit to do anyway. Also wanted to add caching for the other proc-macro types as well (does that make sense already?). Otherwise, what would be the correct way to find somebody? I can definitely ask on Zulip, if that is a good way to go about this. Again, thanks for the review! |
e7cbfd7
to
bb3bddf
Compare
441c4ab
to
abcc4a2
Compare
This comment has been minimized.
This comment has been minimized.
@rustbot ready Happy holidays if you're celebrating! Took some time, please let me know if I missed any of the feedback :) |
☔ The latest upstream changes (presumably #136471) made this pull request unmergeable. Please resolve the merge conflicts. |
|
||
impl Hash for TokenStream { | ||
fn hash<H: std::hash::Hasher>(&self, state: &mut H) { | ||
Encodable::encode(self, &mut HashEncoder { hasher: state }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also the use of
(Span)Encoder
here is strange, I don't think it's supposed to be used like this.)
This comment is still relevant, why did you use the Encoder
traits for implementing hashing?
Is this Hash
implementation actually used? What happens if you put a panic into it?
Can it be implemented in terms of the already existing HashStable
implementation for TokenStream
?
// The proc-macro (for `Nothing`) prints a message to stderr when invoked, | ||
// and this message should only be present during the second invocation | ||
// (which has `cfail2` set via cfg). | ||
// FIXME(pr-time): Properly have the test check this, but how? UI-test that tests for `.stderr`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, run-make tests (tests/run-make
) allow to run the compiler twice and look at stderr.
There are even some examples of running the compiler in incremental mode there (search by "incremental").
@@ -103,6 +104,13 @@ pub use plumbing::{IntoQueryParam, TyCtxtAt, TyCtxtEnsure, TyCtxtEnsureWithValue | |||
// Queries marked with `fatal_cycle` do not need the latter implementation, | |||
// as they will raise an fatal error on query cycles instead. | |||
rustc_queries! { | |||
query derive_macro_expansion(key: (LocalExpnId, Svh, &'tcx TokenStream)) -> Result<&'tcx TokenStream, ()> { | |||
// eval_always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On another hand, with eval_always
the query will never try loading the on disk cache, which is the whole point of this PR.
assert_eq!(invoc_expn_data.call_site, span); | ||
|
||
// FIXME(pr-time): Is this the correct way to check for incremental compilation (as | ||
// well as for `cache_proc_macros`)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
// FIXME(pr-time): Just using the crate hash to notice when the proc-macro code has | ||
// changed. How to *correctly* depend on exactly the macro definition? | ||
// I.e., depending on the crate hash is just a HACK (and leaves garbage in the | ||
// incremental compilation dir). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I understand.
In the query key
&'tcx TokenStream
checks whether the macro input changedLocalExpnId
checks whether the macro call context changed- and the proc macro crate's
Svh
checks whether the macro definition changed
}); | ||
return ExpandResult::Ready(vec![]); | ||
} | ||
let res = ty::tls::with(|tcx| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping into tls::with
is also unnecessary if we are not in incremental + macro caching mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tcx.arena.alloc(stream)
part can also be omitted.
// FIXME(pr-time): Just using the crate hash to notice when the proc-macro code has | ||
// changed. How to *correctly* depend on exactly the macro definition? | ||
// I.e., depending on the crate hash is just a HACK, and ideally the dependency would be | ||
// more narrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any more precise way.
Proc macro is an executable code, it can run arbitrary functions from the macro crate itself or other crates, we don't have any system for incremental tracking of runtime dependencies, so we can only assume that if anything in the crate changes it can change the macro logic as well.
}); | ||
return ExpandResult::Ready(vec![]); | ||
let res = ty::tls::with(|tcx| { | ||
// FIXME(pr-time): without flattened some (weird) tests fail, but no idea if it's correct/enough |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If flattening is not enabled then some hashing/encoding/decoding function panics due to encountering an AST piece in the token stream?
Sorry for the long delay, it weren't exactly holidays, but now I should be back to reviewing. |
@@ -31,9 +37,9 @@ impl<T> pm::bridge::server::MessagePipe<T> for MessagePipe<T> { | |||
} | |||
} | |||
|
|||
fn exec_strategy(ecx: &ExtCtxt<'_>) -> impl pm::bridge::server::ExecutionStrategy { | |||
pub fn exec_strategy(sess: &Session) -> impl pm::bridge::server::ExecutionStrategy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn exec_strategy(sess: &Session) -> impl pm::bridge::server::ExecutionStrategy { | |
fn exec_strategy(sess: &Session) -> impl pm::bridge::server::ExecutionStrategy { |
); | ||
|
||
let proc_macro_backtrace = ecx.ecfg.proc_macro_backtrace; | ||
let strategy = crate::proc_macro::exec_strategy(tcx.sess); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let strategy = crate::proc_macro::exec_strategy(tcx.sess); | |
let strategy = exec_strategy(tcx.sess); |
|
||
let proc_macro_backtrace = ecx.ecfg.proc_macro_backtrace; | ||
let strategy = crate::proc_macro::exec_strategy(tcx.sess); | ||
let server = crate::proc_macro_server::Rustc::new(ecx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let server = crate::proc_macro_server::Rustc::new(ecx); | |
let server = proc_macro_server::Rustc::new(ecx); |
|
||
/// Sets `context` as the new current `CONTEXT` for the duration of the function `f`. | ||
#[inline] | ||
pub(crate) fn enter_context<'a, F, R>(context: (&mut ExtCtxt<'a>, CLIENT), f: F) -> R |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub(crate) fn enter_context<'a, F, R>(context: (&mut ExtCtxt<'a>, CLIENT), f: F) -> R | |
fn enter_context<'a, F, R>(context: (&mut ExtCtxt<'a>, CLIENT), f: F) -> R |
with_context(|(ecx, client)| expand_derive_macro(tcx, invoc_id, input, ecx, *client)) | ||
} | ||
|
||
type CLIENT = pm::bridge::client::Client<pm::TokenStream, pm::TokenStream>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type CLIENT = pm::bridge::client::Client<pm::TokenStream, pm::TokenStream>; | |
type DeriveClient = pm::bridge::client::Client<pm::TokenStream, pm::TokenStream>; |
This alias can be used in a couple more places in this file.
}); | ||
return ExpandResult::Ready(vec![]); | ||
} | ||
let res = ty::tls::with(|tcx| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tcx.arena.alloc(stream)
part can also be omitted.
// based on rust/compiler/rustc_middle/src/ty/context/tls.rs | ||
thread_local! { | ||
/// A thread local variable that stores a pointer to the current `CONTEXT`. | ||
static TLV: Cell<(*mut (), Option<CLIENT>)> = const { Cell::new((ptr::null_mut(), None)) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks more like scoped_thread_local!
, because we put a value into it outside the query call, and then retrieve it immediately inside the call and don't need it anymore.
Maybe it will help to avoid some manual type erasure.
On-Disk Caching For Derive Proc-Macro Invocations
This PR adds on-disk caching for derive proc-macro invocations using rustc's query system to speed up incremental compilation.
The implementation is (intentionally) a bit rough/incomplete, as I wanted to see whether this helps with performance before fully implementing it/RFCing etc.
I did some ad-hoc performance testing.
Rough, Preliminary Eval Results:
Using a version built through
DEPLOY=1 src/ci/docker/run.sh dist-x86_64-linux
(which I got from here).Some Small Personal Project:
Caused a re-check of 1 crate (the only one).
Result:
No visible difference.
Bevy:
$ touch crates/bevy_ecs/src/lib.rs && cargo +dist check
Caused a re-check of 29 crates.
Result:
Roughly 1s, or ~17% speedup.
Polkadot-Sdk:
Basically this script (not mine): https://github.com/coderemotedotdev/rustc-profiles/blob/d61ad38c496459d82e35d8bdb0a154fbb83de903/scripts/benchmark_incremental_builds_polkadot_sdk.sh
TL;DR: Two full
cargo check
runs to fill the incremental caches (for cached & uncached). Then 10 repetitions oftouch $some_file && cargo +uncached check && cargo +cached check
.Huge workspace with ~190 crates. Not sure how many were re-built/re-checkd on each invocation.
Result:
Very visible speedup of 31.9s or ~32%.
-> Based on these results I think it makes sense to do a rustc-perf run and see what that reports.
Current Limitations/TODOs
I left some
FIXME(pr-time)
s in the code for things I wanted to bring up/draw attention to in this PR. Usually when I wasn't sure if I found a (good) solution or when I knew that there might be a better way to do something; See the diff for these.High-Level Overview of What's Missing For "Real" Usage:
Bang
- andAttr
-proc macros (currently onlyDerive
).derive
-proc macros for now, since I felt like these should be most cacheable and are used very often in practice.-Z cache-derive-macros=some_pm_crate::some_derive_macro_fn
for easy testing using the nightly compiler.#[proc_macro_cacheable]
annotation to allow proc-macro authors to "opt-in" to caching (or sth. similar). Would probably need an RFC?TokenStream
s?TokenStream
s aren't hashable/stable-hashable/encodable(?) for now. We have to a) detect that, which might require catching a panic (forHash::hash()
I believe), or b) otherwise establishingTryHash
etc. traits that are more failure resistant.So, just since you were in the loop on the attempt to cache declarative macro expansions:
r? @petrochenkov
Please feel free to re-/unassign!
Finally: I hope this isn't too big a PR, I'll also show up in Zulip since I read that that is usually appreciated. Thanks a lot for taking a look! :)
(Kind of related/very similar approach, old declarative macro caching PR: #128747)