Speed up target feature computation #137586

nnethercote · 2025-02-25T05:52:27Z

The LLVM backend calls LLVMRustHasFeature twice for every feature. In short-running rustc invocations, this accounts for a surprising amount of work.

r? @bjorn3

nnethercote · 2025-02-25T07:31:42Z

@bors try @rust-timer queue

bors · 2025-02-25T07:32:53Z

⌛ Trying commit 4d82120 with merge e70101e...

Speed up target feature computation r? `@ghost`

bors · 2025-02-25T09:34:20Z

☀️ Try build successful - checks-actions
Build commit: e70101e (e70101ed08ced7375221ca3bf7bad5f119506b4e)

rust-timer · 2025-02-25T12:05:54Z

Finished benchmarking commit (e70101e): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.6%	[-9.9%, -0.1%]	41
Improvements ✅ (secondary)	-1.9%	[-8.3%, -0.2%]	116
All ❌✅ (primary)	-1.6%	[-9.9%, -0.1%]	41

Max RSS (memory usage)

Results (primary 0.5%, secondary 0.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.9%	[0.4%, 2.9%]	56
Regressions ❌ (secondary)	0.9%	[0.4%, 3.7%]	105
Improvements ✅ (primary)	-1.0%	[-2.9%, -0.4%]	16
Improvements ✅ (secondary)	-0.9%	[-1.9%, -0.5%]	21
All ❌✅ (primary)	0.5%	[-2.9%, 2.9%]	72

Cycles

Results (primary -0.5%, secondary -0.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.4%, 3.6%]	11
Regressions ❌ (secondary)	1.4%	[0.4%, 4.8%]	36
Improvements ✅ (primary)	-0.8%	[-4.9%, -0.4%]	65
Improvements ✅ (secondary)	-1.1%	[-4.6%, -0.4%]	104
All ❌✅ (primary)	-0.5%	[-4.9%, 3.6%]	76

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 772.351s -> 771.231s (-0.15%)
Artifact size: 361.91 MiB -> 361.95 MiB (0.01%)

rustbot · 2025-02-25T21:15:09Z

Some changes occurred in compiler/rustc_codegen_cranelift

cc @bjorn3

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

nnethercote · 2025-02-25T21:25:19Z

Fantastic icount results here for short-running rustc invocations. It's not clear they translate to actual time wins, but if nothing else it will make LLVM's SetImpliedBits function show up less high in Cachegrind profiles. (On my machine it's enough to move SetImpliedBits from the #1 position to #2, after _dl_relocate_object 😃 )

nnethercote · 2025-02-25T21:33:10Z

@nikic: after this PR merges, LLVMRustHasFeature is called once for every feature. It calls MCInfo->checkFeatures() with a +-prefixed feature name. That method calls ::ApplyFeatureFlag() twice; the two calls always give the same result because of the + prefix. ::ApplyFeatureFlag() then calls SetImpliedBits(), which is the function that shows up in the Cachegrind profiles; it does some bitset stuff.

This PR avoided the silly rustc-side problem of LLVMRustHasFeature being called twice for every feature. The rustc/LLVM communication still seems a little heavyweight here. It feels like checkFeatures is designed to check multiple features at once, and rustc calling it for every feature is sub-optimal. Any ideas on how all this could be streamlined?

Also improve some comments.

Currently its argument is an iterator, but in practice it's always a singleton.

Currently it is called twice, once with `allow_unstable` set to true and once with it set to false. This results in some duplicated work. Most notably, for the LLVM backend, `LLVMRustHasFeature` is called twice for every feature, and it's moderately slow. For very short running compilations on platforms with many features (e.g. a `check` build of hello-world on x86) this is a significant fraction of runtime. This commit changes `target_features_cfg` so it is only called once, and it now returns a pair of feature sets. This halves the number of `LLVMRustHasFeature` calls.

No smallvecs here.

nnethercote · 2025-03-04T23:17:21Z

@bjorn3: I fixed the conflicts.

bjorn3 · 2025-03-05T09:40:04Z

compiler/rustc_codegen_cranelift/src/lib.rs

-        }
+        };
+        // FIXME do `unstable_target_features` properly
+        let unstable_target_features = target_features.clone();


This is not just unstable target features as the name of this variable implies, but all including unstable ones. Maybe rename it?

all_target_features and stable_target_features would be clearer names, yes. But Session already has fields called unstable_target_features and target_features, and this is the code that is used to set those fields, so I used those names for consistency. If you think it's important, maybe it should be a follow-up.

bjorn3 · 2025-03-05T09:43:05Z

compiler/rustc_codegen_llvm/src/llvm_util.rs

+            })
+            .filter(|feature| features.contains(&feature))
+            .collect()
+    };


The code in this closure is duplicated between all codegen backends. Maybe move it to the caller of target_features_cfg? So just return features from target_features_cfg and then do the splitting between stable and unstable features in the caller.

I don't see it in the cranelift backend.

And if the response is "the cranelift backend should do that filtering", that would also be a good follow-up, because that's a pre-existing issue that is distinct from the one this PR is addressing.

The Cranelift backend currently doesn't consider any unstable feature to ever be enabled at all, so filtering has no effect. Once it does support unstable features, yes it should do filtering.

Ok, makes sense.

nnethercote · 2025-03-06T19:31:51Z

@bjorn3: are you satisfied with my responses above? Anything else need changing here?

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 25, 2025

This comment has been minimized.

Sign in to view

nnethercote force-pushed the SetImpliedBits branch from 145e520 to 4d82120 Compare February 25, 2025 06:25

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 25, 2025

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 25, 2025

Auto merge of rust-lang#137586 - nnethercote:SetImpliedBits, r=<try>

e70101e

Speed up target feature computation r? `@ghost`

bors mentioned this pull request Feb 25, 2025

Build GCC on CI #136921

Merged

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 25, 2025

nnethercote marked this pull request as ready for review February 25, 2025 21:15

rustbot assigned bjorn3 Feb 25, 2025

nnethercote added 5 commits March 5, 2025 09:20

Avoid double interning of feature names.

1df93fd

Also improve some comments.

Simplify implied_target_features.

2df8e65

Currently its argument is an iterator, but in practice it's always a singleton.

Use collect to initialize features.

35b7994

Remove out of date comment.

cee3114

No smallvecs here.

nnethercote force-pushed the SetImpliedBits branch from 4d82120 to cee3114 Compare March 4, 2025 23:17

bjorn3 reviewed Mar 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up target feature computation #137586

Speed up target feature computation #137586

nnethercote commented Feb 25, 2025 •

edited

Loading

This comment has been minimized.

nnethercote commented Feb 25, 2025

This comment has been minimized.

bors commented Feb 25, 2025

bors commented Feb 25, 2025

This comment has been minimized.

rust-timer commented Feb 25, 2025

rustbot commented Feb 25, 2025

nnethercote commented Feb 25, 2025 •

edited

Loading

nnethercote commented Feb 25, 2025

nnethercote commented Mar 4, 2025

bjorn3 Mar 5, 2025

nnethercote Mar 5, 2025

bjorn3 Mar 5, 2025

nnethercote Mar 5, 2025

nnethercote Mar 5, 2025

bjorn3 Mar 6, 2025

nnethercote Mar 6, 2025

nnethercote commented Mar 6, 2025

Speed up target feature computation #137586

Are you sure you want to change the base?

Speed up target feature computation #137586

Conversation

nnethercote commented Feb 25, 2025 • edited Loading

This comment has been minimized.

nnethercote commented Feb 25, 2025

This comment has been minimized.

bors commented Feb 25, 2025

bors commented Feb 25, 2025

This comment has been minimized.

rust-timer commented Feb 25, 2025

Overall result: ✅ improvements - no action needed

rustbot commented Feb 25, 2025

nnethercote commented Feb 25, 2025 • edited Loading

nnethercote commented Feb 25, 2025

nnethercote commented Mar 4, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnethercote commented Mar 6, 2025

nnethercote commented Feb 25, 2025 •

edited

Loading

nnethercote commented Feb 25, 2025 •

edited

Loading