-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up target feature computation #137586
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
145e520
to
4d82120
Compare
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Speed up target feature computation r? `@ghost`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e70101e): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 0.5%, secondary 0.6%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -0.5%, secondary -0.5%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 772.351s -> 771.231s (-0.15%) |
Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 Some changes occurred in compiler/rustc_codegen_gcc |
Fantastic icount results here for short-running rustc invocations. It's not clear they translate to actual time wins, but if nothing else it will make LLVM's |
@nikic: after this PR merges, This PR avoided the silly rustc-side problem of |
Also improve some comments.
Currently its argument is an iterator, but in practice it's always a singleton.
Currently it is called twice, once with `allow_unstable` set to true and once with it set to false. This results in some duplicated work. Most notably, for the LLVM backend, `LLVMRustHasFeature` is called twice for every feature, and it's moderately slow. For very short running compilations on platforms with many features (e.g. a `check` build of hello-world on x86) this is a significant fraction of runtime. This commit changes `target_features_cfg` so it is only called once, and it now returns a pair of feature sets. This halves the number of `LLVMRustHasFeature` calls.
No smallvecs here.
4d82120
to
cee3114
Compare
@bjorn3: I fixed the conflicts. |
} | ||
}; | ||
// FIXME do `unstable_target_features` properly | ||
let unstable_target_features = target_features.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not just unstable target features as the name of this variable implies, but all including unstable ones. Maybe rename it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_target_features
and stable_target_features
would be clearer names, yes. But Session
already has fields called unstable_target_features
and target_features
, and this is the code that is used to set those fields, so I used those names for consistency. If you think it's important, maybe it should be a follow-up.
}) | ||
.filter(|feature| features.contains(&feature)) | ||
.collect() | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in this closure is duplicated between all codegen backends. Maybe move it to the caller of target_features_cfg
? So just return features
from target_features_cfg
and then do the splitting between stable and unstable features in the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it in the cranelift backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if the response is "the cranelift backend should do that filtering", that would also be a good follow-up, because that's a pre-existing issue that is distinct from the one this PR is addressing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Cranelift backend currently doesn't consider any unstable feature to ever be enabled at all, so filtering has no effect. Once it does support unstable features, yes it should do filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense.
@bjorn3: are you satisfied with my responses above? Anything else need changing here? |
The LLVM backend calls
LLVMRustHasFeature
twice for every feature. In short-running rustc invocations, this accounts for a surprising amount of work.r? @bjorn3