-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
poly1305: AVX2 detection #97
Conversation
Codecov Report
@@ Coverage Diff @@
## master #97 +/- ##
===========================================
+ Coverage 29.91% 49.86% +19.94%
===========================================
Files 12 13 +1
Lines 976 1083 +107
===========================================
+ Hits 292 540 +248
+ Misses 684 543 -141
Continue to review full report at Codecov.
|
ae1e54c
to
d0a4784
Compare
Performance seems slightly reduced (maybe ~5% or so) I'm guessing we should add |
c789536
to
c7d61c7
Compare
So this branch definitely seems to be a little bit slower than Running
On this branch with
Curiously, those are approximately the same numbers I get with
I tried liberally applying
So it appears to be something tied to the newly added code, rather than compiling without It might be related to the use of an I can try opening a second PR that uses |
After switching to a
But also, we get the same performance with autodetection enabled!
So it seems the |
6b3f78c
to
87a8708
Compare
Slapping Performance with autodetection enabled now matches what it previously was with explicit
|
87a8708
to
c160d1c
Compare
It seems like the I was previously compiling with If I compile without that, I think LLVM may be emulating AVX2 or something?(!)
Note that this is significantly slower than the "soft" backend:
So it seems the |
713da83
to
8dee7e9
Compare
Wow, that's wild. I tried applying
However, it seems to be failing the tests in CI! https://github.com/RustCrypto/universal-hashes/pull/97/checks?check_run_id=1520303853#step:6:34
Whatever broke it is something I changed in 8dee7e9, but yet it works locally for me. |
Automatically detects the availability of CLMUL based on CPUID, and falls back to the "soft" implementation if unavailable.
dee84bf
to
c4a31a7
Compare
I pushed up a debug commit (db6ded4) to test whether or not https://github.com/RustCrypto/universal-hashes/pull/97/checks?check_run_id=1525715319 |
db6ded4
to
976c4f5
Compare
Well, currently I can only propose to insert a number of |
@newpavlov yep, just pushed up a commit with some debugging: a160473 |
bf21b8e
to
5de11d4
Compare
Okay, I finally have it reproduced locally. It's passing without If I add |
This is almost certainly a rustc bug. I pushed up 7485010 which includes ample information about my investigation. Among other things, I was able to insert a I also managed to narrow down where I first observed values being miscomputed. Will try to report this upstream. |
As a temporary hack how about moving part of the affected code into a separate function marked with |
Yeah, I can try extracting the lambda into a named function and play with inlining options on it |
7485010
to
eeab81a
Compare
eeab81a
to
b5abdc1
Compare
Alright, managed to work around the problem 🎉 The miscompilation was occurring inside of some lambdas. I extracted the lambdas into named functions in b5abdc1, and the tests are now passing. |
Benchmarks look great too:
|
Let me try to address the |
@newpavlov double check me on 3c5e9db, but I think that's what you had in mind for zeroization |
Yes, it's what I had in mind. It could be a bit more compact to cast into an array: const SIZE: usize = core::mem::size_of::<State>();
let inner_array = unsafe {
&mut *(self as *mut State as *mut [u8; SIZE])
}; BTW you don't have to use |
3c5e9db
to
20dd65a
Compare
20dd65a
to
442c32c
Compare
@newpavlov cool, just updated it with your suggestion |
Automatically detects the availability of CLMUL based on CPUID, and falls back to the "soft" implementation if unavailable.