Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polyval: Add runtime PCLMULQDQ detection #11

Closed
wants to merge 1 commit into from

Conversation

tarcieri
Copy link
Member

When the std feature is enabled (which it is now by default), this adds runtime detection for PCLMULQDQ support on x86/x86_64 architectures.

The detection happens once at the time Polyval is instantiated. The polyval::field::Element type has been changed into an enum which remembers the detection result, and its API changed to operate on bytestring representations of POLYVAL field elements.

This appears to have a negligible performance impact.

@tarcieri tarcieri requested a review from newpavlov September 18, 2019 20:08
target_feature = "sse2",
target_feature = "sse4.1",
any(target_arch = "x86", target_arch = "x86_64")
))]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional gating this implementation is using is gross. Open to alternatives... perhaps the cfg-if crate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried cfg-if. Unfortunately it doesn't seem to support nesting, and the resulting code ended up looking just as bad

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made things a little better by using cfg!. Still a lot of annoying, redundant gating though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My kingdom for cfg_alias: rust-lang/cargo#7260 (comment)

@tarcieri tarcieri force-pushed the polyval/runtime-detection branch from 44ce8e5 to 231a23b Compare September 18, 2019 20:13
@WildCryptoFox
Copy link

Everything looks good to me.

@tarcieri tarcieri force-pushed the polyval/runtime-detection branch 4 times, most recently from 65f6879 to 86e59eb Compare September 19, 2019 02:48
@tarcieri
Copy link
Member Author

@newpavlov WDYT? I'm calling this "done" but I'd be interested to know any ideas for potential simplifications/cleanups

@tarcieri tarcieri force-pushed the polyval/runtime-detection branch from 86e59eb to 87cc950 Compare September 19, 2019 03:40
When the `std` feature is enabled (which it is now by default), this
adds runtime detection for PCLMULQDQ support on x86/x86_64
architectures.

The detection happens once at the time `Polyval` is instantated. The
`polyval::field::Element` type has been changed into an enum which
remembers the detection result, and its API changed to operate on
bytestring representations of POLYVAL field elements.

This appears to have a negligable performance impact.
@tarcieri tarcieri force-pushed the polyval/runtime-detection branch from 87cc950 to f0db4e4 Compare September 19, 2019 03:43
pub fn from_bytes(bytes: Block) -> Self {
Element(bytes.into())
if cfg!(feature = "std") {
if is_x86_feature_detected!("pclmulqdq") {
Copy link
Member

@newpavlov newpavlov Sep 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite like that it does not work on no_std targets. I think we better to use RDRAND directly and cache results in atomic.

@newpavlov
Copy link
Member

IIUC this implementation branches between Soft and Clmul more than once per single call to Ghash, while ideally it should branch only ones.

Also I think you are using cfg a bit incorrectly. If CLMUL and other necessary target features are enabled, we should remove the software fallback completely.

One approach which we can take looks roughly like this:

trait Element { .. }

impl Element for Clmul { .. }
impl Element for Soft { .. }

struct GhashImpl<E: Element> { .. }

enum GhashPrivate {
    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
    Clmul(GhashImpl<Clmul>),
    // disable soft fallback if CLMUL is enabled by user
    #[cfg(not(all(
        any(target_arch = "x86", target_arch = "x86_64"),
        all(target_feature = "pclmulqdq", ..)
    )))]
    Soft(GhashImpl<Soft>),
}

struct Ghash(GhashPrivate);

But I am not sure how good inlining will work with #[target_feature(enable = "sse4")] added to trait impl. The main idea is what we want to switch to a full implementation for given conditions as high as possible and do it only once per Ghash call.

@tarcieri
Copy link
Member Author

tarcieri commented Sep 19, 2019

@newpavlov upon further reflection, I agree runtime detection isn't helpful. It would be extremely helpful for, say, AES-NI, where it isn't available inside KVM and therefore relocatable binaries produced for architectures that support it fail in these environments, but from what I can tell CLMUL isn't similarly impacted.

That said, your suggested approach is pretty much what the existing implementation already does.

I think I can eliminate the need for an enum and a trait entirely though via a simple newtype which only ensures the relevant APIs are equivalent though, which I'll take a stab at in a separate PR.

It seems we can also fake cfg aliases in build scripts per rust-lang/cargo#7260 (comment), so I might take a look at that as a way to simplify the target gating.

Closing this PR out.

@tarcieri tarcieri closed this Sep 19, 2019
@tarcieri tarcieri deleted the polyval/runtime-detection branch September 19, 2019 14:37
@newpavlov
Copy link
Member

I wouldn't say it isn't helpful. But if we want to squeeze a maximum performance, I think we should push switch between target dependent implementations as high as possible (i.e. we should not do runtime detections for aes, ctr, ghash, but do a single detection and switch in a high-level construct as AES-GCM-SIV), and unfortunately right now Rust is quite poorly equipped for doing that, especially in mult-crate contexts.

@tarcieri
Copy link
Member Author

@newpavlov I'm down to actually implement AES-GCM-SIV first, then circle back on this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants