Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up target feature computation #137586

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Feb 25, 2025

The LLVM backend calls LLVMRustHasFeature twice for every feature. In short-running rustc invocations, this accounts for a surprising amount of work.

r? @bjorn3

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 25, 2025
@rust-log-analyzer

This comment has been minimized.

@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 25, 2025
@bors
Copy link
Contributor

bors commented Feb 25, 2025

⌛ Trying commit 4d82120 with merge e70101e...

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 25, 2025
Speed up target feature computation

r? `@ghost`
@bors bors mentioned this pull request Feb 25, 2025
@bors
Copy link
Contributor

bors commented Feb 25, 2025

☀️ Try build successful - checks-actions
Build commit: e70101e (e70101ed08ced7375221ca3bf7bad5f119506b4e)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e70101e): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.6% [-9.9%, -0.1%] 41
Improvements ✅
(secondary)
-1.9% [-8.3%, -0.2%] 116
All ❌✅ (primary) -1.6% [-9.9%, -0.1%] 41

Max RSS (memory usage)

Results (primary 0.5%, secondary 0.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.9% [0.4%, 2.9%] 56
Regressions ❌
(secondary)
0.9% [0.4%, 3.7%] 105
Improvements ✅
(primary)
-1.0% [-2.9%, -0.4%] 16
Improvements ✅
(secondary)
-0.9% [-1.9%, -0.5%] 21
All ❌✅ (primary) 0.5% [-2.9%, 2.9%] 72

Cycles

Results (primary -0.5%, secondary -0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [0.4%, 3.6%] 11
Regressions ❌
(secondary)
1.4% [0.4%, 4.8%] 36
Improvements ✅
(primary)
-0.8% [-4.9%, -0.4%] 65
Improvements ✅
(secondary)
-1.1% [-4.6%, -0.4%] 104
All ❌✅ (primary) -0.5% [-4.9%, 3.6%] 76

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 772.351s -> 771.231s (-0.15%)
Artifact size: 361.91 MiB -> 361.95 MiB (0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 25, 2025
@nnethercote nnethercote marked this pull request as ready for review February 25, 2025 21:15
@rustbot
Copy link
Collaborator

rustbot commented Feb 25, 2025

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

@nnethercote
Copy link
Contributor Author

nnethercote commented Feb 25, 2025

Fantastic icount results here for short-running rustc invocations. It's not clear they translate to actual time wins, but if nothing else it will make LLVM's SetImpliedBits function show up less high in Cachegrind profiles. (On my machine it's enough to move SetImpliedBits from the #1 position to #2, after _dl_relocate_object 😃 )

@nnethercote
Copy link
Contributor Author

@nikic: after this PR merges, LLVMRustHasFeature is called once for every feature. It calls MCInfo->checkFeatures() with a +-prefixed feature name. That method calls ::ApplyFeatureFlag() twice; the two calls always give the same result because of the + prefix. ::ApplyFeatureFlag() then calls SetImpliedBits(), which is the function that shows up in the Cachegrind profiles; it does some bitset stuff.

This PR avoided the silly rustc-side problem of LLVMRustHasFeature being called twice for every feature. The rustc/LLVM communication still seems a little heavyweight here. It feels like checkFeatures is designed to check multiple features at once, and rustc calling it for every feature is sub-optimal. Any ideas on how all this could be streamlined?

Also improve some comments.
Currently its argument is an iterator, but in practice it's always a
singleton.
Currently it is called twice, once with `allow_unstable` set to true and
once with it set to false. This results in some duplicated work. Most
notably, for the LLVM backend, `LLVMRustHasFeature` is called twice for
every feature, and it's moderately slow. For very short running
compilations on platforms with many features (e.g. a `check` build of
hello-world on x86) this is a significant fraction of runtime.

This commit changes `target_features_cfg` so it is only called once, and
it now returns a pair of feature sets. This halves the number of
`LLVMRustHasFeature` calls.
No smallvecs here.
@nnethercote
Copy link
Contributor Author

@bjorn3: I fixed the conflicts.

}
};
// FIXME do `unstable_target_features` properly
let unstable_target_features = target_features.clone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not just unstable target features as the name of this variable implies, but all including unstable ones. Maybe rename it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all_target_features and stable_target_features would be clearer names, yes. But Session already has fields called unstable_target_features and target_features, and this is the code that is used to set those fields, so I used those names for consistency. If you think it's important, maybe it should be a follow-up.

})
.filter(|feature| features.contains(&feature))
.collect()
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in this closure is duplicated between all codegen backends. Maybe move it to the caller of target_features_cfg? So just return features from target_features_cfg and then do the splitting between stable and unstable features in the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it in the cranelift backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if the response is "the cranelift backend should do that filtering", that would also be a good follow-up, because that's a pre-existing issue that is distinct from the one this PR is addressing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cranelift backend currently doesn't consider any unstable feature to ever be enabled at all, so filtering has no effect. Once it does support unstable features, yes it should do filtering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, makes sense.

@nnethercote
Copy link
Contributor Author

@bjorn3: are you satisfied with my responses above? Anything else need changing here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants