[libclc] Optimize isfpclass-like CLC builtins #124145

frasercrmck · 2025-01-23T16:37:51Z

The builtins we were using to implement __clc_is(finite|inf|nan|normal) -- __builtin_isfinite, etc. -- don't take vector types so we were previously scalarizing. The __builtin_isfpclass builtin does take vector types and thus allows us to keep things in vectors.

There is no change in codegen to the scalar versions of any of these builtins.

arsenm

How does using isfpclass avoid scalarization here? I think it's somewhat preferably to use the named operations here, they are subtly different since they canonicalize the input unlike is.fpclass

frasercrmck · 2025-01-24T06:56:24Z

How does using isfpclass avoid scalarization here? I think it's somewhat preferably to use the named operations here, they are subtly different since they canonicalize the input unlike is.fpclass

The builtins we were using before, like __builtin_isnan, don't take vector types so we were forced to scalarize.

I actually started looked into adding __builtin_elementwise_isnan etc. to clang before realizing that __builtin_isfpclass(x, 0x3) accepts vector types and generates the same code as __builtin_isnan(x) does for scalar types (and essentially the same for vectors). I don't see any input canonicalization going on before this change.

in function _Z5isnanDv2_f:
  in block %entry:
    >   %0 = fcmp uno <2 x float> %a, zeroinitializer
    >   %sext.i = sext <2 x i1> %0 to <2 x i32>
    >   ret <2 x i32> %sext.i
    <   %0 = extractelement <2 x float> %a, i64 0
    <   %1 = fcmp uno float %0, 0.000000e+00
    <   %2 = zext i1 %1 to i32
    <   %vecinit.i = insertelement <2 x i32> poison, i32 %2, i64 0
    <   %3 = extractelement <2 x float> %a, i64 1
    <   %4 = fcmp uno float %3, 0.000000e+00
    <   %5 = zext i1 %4 to i32
    <   %vecinit2.i = insertelement <2 x i32> %vecinit.i, i32 %5, i64 1
    <   %cmp.i = icmp ne <2 x i32> %vecinit2.i, zeroinitializer
    <   %sext.i = sext <2 x i1> %cmp.i to <2 x i32>
    <   ret <2 x i32> %sext.i

Using __builtin_isfpclass helps us to avoid scalarization in the vector forms of __clc_is(finite|inf|nan|normal) (and thus their OpenCL counterparts).

arsenm

I remember debugging inconsistent behavior on different platforms in treatment of signaling nans and denormal flushing with these queries. I think these days we emit them as is.fpclass anyway, and then instcombine turns them into fcmp when valid. This may be a backwards system, so this may need revisiting in the future

frasercrmck requested a review from arsenm January 23, 2025 16:37

frasercrmck added the libclc label Jan 23, 2025

arsenm reviewed Jan 24, 2025

View reviewed changes

frasercrmck force-pushed the libclc-clc-isfpclass branch from 738122e to 0186cf2 Compare January 27, 2025 13:40

[libclc] Optimize isfpclass-like CLC builtins

Loading
Loading status checks…

66f750c

Using __builtin_isfpclass helps us to avoid scalarization in the vector forms of __clc_is(finite|inf|nan|normal) (and thus their OpenCL counterparts).

frasercrmck force-pushed the libclc-clc-isfpclass branch from 0186cf2 to 66f750c Compare January 27, 2025 15:53

arsenm approved these changes Jan 28, 2025

View reviewed changes

frasercrmck merged commit a8c82d5 into llvm:main Jan 28, 2025
8 checks passed

frasercrmck deleted the libclc-clc-isfpclass branch January 28, 2025 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libclc] Optimize isfpclass-like CLC builtins #124145

[libclc] Optimize isfpclass-like CLC builtins #124145

frasercrmck commented Jan 23, 2025 •

edited

Loading

arsenm left a comment

frasercrmck commented Jan 24, 2025

arsenm left a comment

[libclc] Optimize isfpclass-like CLC builtins #124145

[libclc] Optimize isfpclass-like CLC builtins #124145

Conversation

frasercrmck commented Jan 23, 2025 • edited Loading

arsenm left a comment

Choose a reason for hiding this comment

frasercrmck commented Jan 24, 2025

arsenm left a comment

Choose a reason for hiding this comment

frasercrmck commented Jan 23, 2025 •

edited

Loading