Skip to content

[libclc] Optimize isfpclass-like CLC builtins #124145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 28, 2025

Conversation

frasercrmck
Copy link
Contributor

@frasercrmck frasercrmck commented Jan 23, 2025

The builtins we were using to implement __clc_is(finite|inf|nan|normal) -- __builtin_isfinite, etc. -- don't take vector types so we were previously scalarizing. The __builtin_isfpclass builtin does take vector types and thus allows us to keep things in vectors.

There is no change in codegen to the scalar versions of any of these builtins.

@frasercrmck frasercrmck requested a review from arsenm January 23, 2025 16:37
@frasercrmck frasercrmck added the libclc libclc OpenCL library label Jan 23, 2025
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does using isfpclass avoid scalarization here? I think it's somewhat preferably to use the named operations here, they are subtly different since they canonicalize the input unlike is.fpclass

@frasercrmck
Copy link
Contributor Author

How does using isfpclass avoid scalarization here? I think it's somewhat preferably to use the named operations here, they are subtly different since they canonicalize the input unlike is.fpclass

The builtins we were using before, like __builtin_isnan, don't take vector types so we were forced to scalarize.

I actually started looked into adding __builtin_elementwise_isnan etc. to clang before realizing that __builtin_isfpclass(x, 0x3) accepts vector types and generates the same code as __builtin_isnan(x) does for scalar types (and essentially the same for vectors). I don't see any input canonicalization going on before this change.

in function _Z5isnanDv2_f:
  in block %entry:
    >   %0 = fcmp uno <2 x float> %a, zeroinitializer
    >   %sext.i = sext <2 x i1> %0 to <2 x i32>
    >   ret <2 x i32> %sext.i
    <   %0 = extractelement <2 x float> %a, i64 0
    <   %1 = fcmp uno float %0, 0.000000e+00
    <   %2 = zext i1 %1 to i32
    <   %vecinit.i = insertelement <2 x i32> poison, i32 %2, i64 0
    <   %3 = extractelement <2 x float> %a, i64 1
    <   %4 = fcmp uno float %3, 0.000000e+00
    <   %5 = zext i1 %4 to i32
    <   %vecinit2.i = insertelement <2 x i32> %vecinit.i, i32 %5, i64 1
    <   %cmp.i = icmp ne <2 x i32> %vecinit2.i, zeroinitializer
    <   %sext.i = sext <2 x i1> %cmp.i to <2 x i32>
    <   ret <2 x i32> %sext.i

Using __builtin_isfpclass helps us to avoid scalarization in the vector
forms of __clc_is(finite|inf|nan|normal) (and thus their OpenCL
counterparts).
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember debugging inconsistent behavior on different platforms in treatment of signaling nans and denormal flushing with these queries. I think these days we emit them as is.fpclass anyway, and then instcombine turns them into fcmp when valid. This may be a backwards system, so this may need revisiting in the future

@frasercrmck frasercrmck merged commit a8c82d5 into llvm:main Jan 28, 2025
8 checks passed
@frasercrmck frasercrmck deleted the libclc-clc-isfpclass branch January 28, 2025 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libclc libclc OpenCL library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants