Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to fix intermittent failures of pgo-branch-weights test. #67829

Merged

Conversation

michaelwoerister
Copy link
Member

This PR tries to fix the intermittent failures of the pgo-branch-weights test (#67746). The failing instances show no !prof annotations in LLVM IR. One possible cause is that the instrumented binary did not record anything. This is something I've occasionally seen happen for similarly small programs when using GNU ld as linker. The linker would not properly append the instruction counter sections, leading to most counters being dropped. This PR makes the test use the Gold linker instead.

It also makes each command exit immediately on failure so we can pinpoint the failure source better, should there still be a problem.

r? @Mark-Simulacrum

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 3, 2020
@michaelwoerister
Copy link
Member Author

@bors p=1 I guess?
cc @rust-lang/infra

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r=me unless you want to deal with the comments

# instrumentation sections correctly. Neither Gold nor LLD have that problem.
ifeq ($(UNAME),Linux)
ifneq (,$(findstring x86,$(TARGET)))
COMMON_FLAGS=-Clink-args=-fuse-ld=gold
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think we're almost certain to have lld around (and can fairly quickly build it if needed, since we have caching in place and such) but I'm less sure about gold on CI. Maybe it makes sense to use lld here? If that doesn't work well locally and this passes CI though I'm fine with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that Gold is part of any standard binutils distribution?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'll find out :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's quite possible :) I had thought it would be more special (otherwise, why is ld the default?) but I guess there are edge cases and breaking the world is not an option for distros and such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that Gold is part of any standard binutils distribution?

This is true on most of Linux distributions but there are popular distros like openSUSE where gold is split to binutils-gold package and not installed by default AFAIK. This should be at least mentioned in the README so users know about this requirement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it would be bad if this made run-make tests fail locally for some people. @Mark-Simulacrum, do we have a good way of using our own rust-lld in run-make tests?

(otherwise, why is ld the default?)

Gold can only handle ELF, ld can handle at lot more, which is why, I'd guess.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add lld to the PATH of run-make tests if it's enabled (i.e., lld_enabled is set in config.toml). I imagine we could expose that similarly to # needs-matching-clang or so and ignore the test if lld is not built, and then make sure we're building lld on CI (we already do this on the test-various builer, I presume for WASM?). That should be a matter of adding --set rust.lld to RUST_CONFIGURE_ARGS, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info! I'll look into that in a follow-up PR if that's OK with you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine with me, yes. I don't know that it's even necessary (maybe worth waiting until someone complains about the gold solution).

cat "$(TMPDIR)"/interesting.ll | "$(LLVM_FILECHECK)" filecheck-patterns.txt
-Cprofile-use="$(TMPDIR)/prof_data_dir/merged.profdata" -O \
-Ccodegen-units=1 --emit=llvm-ir || exit 1
cat "$(TMPDIR)/interesting.ll" | "$(LLVM_FILECHECK)" filecheck-patterns.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should move these commands to a shell script and then use set -euo pipefail instead of this mess of exits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not nice, but if we move this to a script then the error in the log will probably just be bash the_script.sh failed, right?

@michaelwoerister
Copy link
Member Author

@bors r=Mark-Simulacrum

@bors
Copy link
Contributor

bors commented Jan 3, 2020

📌 Commit 971aa2b has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 3, 2020
@@ -15,21 +15,29 @@ ifdef IS_MSVC
COMMON_FLAGS=-Cpanic=abort
endif

# For some very small programs GNU ld seems to not properly handle
# instrumentation sections correctly. Neither Gold nor LLD have that problem.
ifeq ($(UNAME),Linux)
Copy link
Contributor

@mati865 mati865 Jan 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows-gnu CI jobs will still use GNU LD version 2.27, is it recent/old enough to not have this bug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an ELF specific issue.

@bors
Copy link
Contributor

bors commented Jan 3, 2020

⌛ Testing commit 971aa2b with merge e845e69...

bors added a commit that referenced this pull request Jan 3, 2020
…-test, r=Mark-Simulacrum

Attempt to fix intermittent failures of pgo-branch-weights test.

This PR tries to fix the intermittent failures of the pgo-branch-weights test (#67746). The failing instances show no `!prof` annotations in LLVM IR. One possible cause is that the instrumented binary did not record anything. This is something I've occasionally seen happen for similarly small programs when using GNU ld as linker. The linker would not properly append the instruction counter sections, leading to most counters being dropped. This PR makes the test use the Gold linker instead.

It also makes each command exit immediately on failure so we can pinpoint the failure source better, should there still be a problem.

r? @Mark-Simulacrum
@bors
Copy link
Contributor

bors commented Jan 4, 2020

☀️ Test successful - checks-azure
Approved by: Mark-Simulacrum
Pushing e845e69 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants