`Pi` and `e` to `Float32` and `Float16` #559

christiangnrd · 2025-03-04T17:01:04Z

Bypasses the conversion to BigFloat when converting pi and e to Float32 and Float16 on gpu. Values are taken from the constants in Tables 6.5 and 6.6 of the Metal shading language specification.

Close #551

github-actions · 2025-03-04T17:01:37Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic main) to apply these changes.

Click here to view the suggested changes.

diff --git a/test/device/intrinsics/math.jl b/test/device/intrinsics/math.jl
index de7ab2f4..34095e3b 100644
--- a/test/device/intrinsics/math.jl
+++ b/test/device/intrinsics/math.jl
@@ -312,32 +312,32 @@ end
         @test occursin(Regex("@air\\.sign\\.f$(8*sizeof(T))"), ir)
     end
 
-    # Borrowed from the Julia "Irrationals compared with Rationals and Floats" testset
-    @testset "Comparisons with $irr" for irr in (π, ℯ)
-        function convert_test_32(res)
-            res[1] = Float32(irr,RoundDown) < irr
-            res[2] = Float32(irr,RoundUp) > irr
-            res[3] = !(Float32(irr,RoundDown) > irr)
-            res[4] = !(Float32(irr,RoundUp) < irr)
-            return nothing
+        # Borrowed from the Julia "Irrationals compared with Rationals and Floats" testset
+        @testset "Comparisons with $irr" for irr in (π, ℯ)
+            function convert_test_32(res)
+                res[1] = Float32(irr, RoundDown) < irr
+                res[2] = Float32(irr, RoundUp) > irr
+                res[3] = !(Float32(irr, RoundDown) > irr)
+                res[4] = !(Float32(irr, RoundUp) < irr)
+                return nothing
+            end
+
+            res_32 = MtlArray(zeros(Bool, 4))
+            Metal.@sync @metal convert_test_32(res_32)
+            @test all(Array(res_32))
+
+            function convert_test_16(res)
+                res[1] = Float16(irr, RoundDown) < irr
+                res[2] = Float16(irr, RoundUp) > irr
+                res[3] = !(Float16(irr, RoundDown) > irr)
+                res[4] = !(Float16(irr, RoundUp) < irr)
+                return nothing
+            end
+
+            res_16 = MtlArray(zeros(Bool, 4))
+            Metal.@sync @metal convert_test_16(res_16)
+            @test all(Array(res_16))
         end
-
-        res_32 = MtlArray(zeros(Bool,4))
-        Metal.@sync @metal convert_test_32(res_32)
-        @test all(Array(res_32))
-
-        function convert_test_16(res)
-            res[1] = Float16(irr,RoundDown) < irr
-            res[2] = Float16(irr,RoundUp) > irr
-            res[3] = !(Float16(irr,RoundDown) > irr)
-            res[4] = !(Float16(irr,RoundUp) < irr)
-            return nothing
-        end
-
-        res_16 = MtlArray(zeros(Bool,4))
-        Metal.@sync @metal convert_test_16(res_16)
-        @test all(Array(res_16))
-    end
 end
 end

christiangnrd · 2025-03-04T17:27:25Z

The Metal constants seem to have the same values as RoundNearest. Should I instead implement the RoundUp/RoundDown behaviour that the cpu comparison uses (irrationals.jl in julia base), or should I leave it as the constants like the default Metal behaviour?

julia> Float32(π, RoundUp)
3.1415927f0

julia> Float32(π, RoundDown)
3.1415925f0

julia> Float16(π, RoundDown)
Float16(3.14)

julia> Float16(π, RoundUp)
Float16(3.143)

julia> Float16(π, RoundNearest)
Float16(3.14)

julia> Float32(π, RoundNearest)
3.1415927f0

github-actions

Metal Benchmarks

Benchmark suite	Current: `df7aae2`	Previous: `4903c64`	Ratio
`private array/construct`	`26923.666666666668` ns	`26097.25` ns	`1.03`
`private array/broadcast`	`262562.5` ns	`464250` ns	`0.57`
`private array/random/randn/Float32`	`499666` ns	`802521.5` ns	`0.62`
`private array/random/randn!/Float32`	`486083` ns	`634417` ns	`0.77`
`private array/random/rand!/Int64`	`455749.5` ns	`555166.5` ns	`0.82`
`private array/random/rand!/Float32`	`358333` ns	`593917` ns	`0.60`
`private array/random/rand/Int64`	`694875` ns	`786666.5` ns	`0.88`
`private array/random/rand/Float32`	`377292` ns	`607291.5` ns	`0.62`
`private array/copyto!/gpu_to_gpu`	`227208` ns	`651959` ns	`0.35`
`private array/copyto!/cpu_to_gpu`	`253417` ns	`816583` ns	`0.31`
`private array/copyto!/gpu_to_cpu`	`252500` ns	`695209` ns	`0.36`
`private array/accumulate/1d`	`7986834` ns	`1337584` ns	`5.97`
`private array/accumulate/2d`	`961916` ns	`1415584` ns	`0.68`
`private array/iteration/findall/int`	`7972541.5` ns	`2090292` ns	`3.81`
`private array/iteration/findall/bool`	`7985875` ns	`1820792` ns	`4.39`
`private array/iteration/findfirst/int`	`1161312` ns	`1682334` ns	`0.69`
`private array/iteration/findfirst/bool`	`1150166` ns	`1668062.5` ns	`0.69`
`private array/iteration/scalar`	`1524271` ns	`3837708` ns	`0.40`
`private array/iteration/logical`	`8099917` ns	`3204958.5` ns	`2.53`
`private array/iteration/findmin/1d`	`1178667` ns	`1767791.5` ns	`0.67`
`private array/iteration/findmin/2d`	`890083` ns	`1357208` ns	`0.66`
`private array/reductions/reduce/1d`	`462459` ns	`1034666.5` ns	`0.45`
`private array/reductions/reduce/2d`	`470292` ns	`666375` ns	`0.71`
`private array/reductions/mapreduce/1d`	`480792` ns	`1037104` ns	`0.46`
`private array/reductions/mapreduce/2d`	`462916.5` ns	`668667` ns	`0.69`
`private array/permutedims/4d`	`1440458` ns	`2542895.5` ns	`0.57`
`private array/permutedims/2d`	`748666` ns	`1024354.5` ns	`0.73`
`private array/permutedims/3d`	`1124604` ns	`1585750` ns	`0.71`
`private array/copy`	`348041.5` ns	`618375` ns	`0.56`
`latency/precompile`	`9080074000` ns	`9065026583` ns	`1.00`
`latency/ttfp`	`3633120458` ns	`3618215084` ns	`1.00`
`latency/import`	`1243107375` ns	`1245634417` ns	`1.00`
`integration/metaldevrt`	`531458` ns	`715042` ns	`0.74`
`integration/byval/slices=1`	`1587937` ns	`1564979.5` ns	`1.01`
`integration/byval/slices=3`	`10447709` ns	`10783271` ns	`0.97`
`integration/byval/reference`	`1483209` ns	`1534875` ns	`0.97`
`integration/byval/slices=2`	`2461354.5` ns	`2619125` ns	`0.94`
`kernel/indexing`	`240458` ns	`468687.5` ns	`0.51`
`kernel/indexing_checked`	`237895.5` ns	`468687` ns	`0.51`
`kernel/launch`	`50916.666666666664` ns	`9437.666666666666` ns	`5.40`
`metal/synchronization/stream`	`14250` ns	`14125` ns	`1.01`
`metal/synchronization/context`	`14750` ns	`14708` ns	`1.00`
`shared array/construct`	`26416.666666666668` ns	`24409.75` ns	`1.08`
`shared array/broadcast`	`253916.5` ns	`460208` ns	`0.55`
`shared array/random/randn/Float32`	`503146` ns	`880875` ns	`0.57`
`shared array/random/randn!/Float32`	`419083.5` ns	`636250` ns	`0.66`
`shared array/random/rand!/Int64`	`430020.5` ns	`551291` ns	`0.78`
`shared array/random/rand!/Float32`	`411750` ns	`594125` ns	`0.69`
`shared array/random/rand/Int64`	`715750` ns	`789250` ns	`0.91`
`shared array/random/rand/Float32`	`342500` ns	`634542` ns	`0.54`
`shared array/copyto!/gpu_to_gpu`	`85542` ns	`83625` ns	`1.02`
`shared array/copyto!/cpu_to_gpu`	`82000` ns	`83041` ns	`0.99`
`shared array/copyto!/gpu_to_cpu`	`84375` ns	`82250` ns	`1.03`
`shared array/accumulate/1d`	`7989271` ns	`1340667` ns	`5.96`
`shared array/accumulate/2d`	`961625` ns	`1394125` ns	`0.69`
`shared array/iteration/findall/int`	`7976667` ns	`1845750` ns	`4.32`
`shared array/iteration/findall/bool`	`7989229.5` ns	`1576917` ns	`5.07`
`shared array/iteration/findfirst/int`	`940166.5` ns	`1392979.5` ns	`0.67`
`shared array/iteration/findfirst/bool`	`925708` ns	`1375687.5` ns	`0.67`
`shared array/iteration/scalar`	`153542` ns	`153000` ns	`1.00`
`shared array/iteration/logical`	`8054708` ns	`2990542` ns	`2.69`
`shared array/iteration/findmin/1d`	`976625` ns	`1483333.5` ns	`0.66`
`shared array/iteration/findmin/2d`	`895709` ns	`1364208.5` ns	`0.66`
`shared array/reductions/reduce/1d`	`377583` ns	`730667` ns	`0.52`
`shared array/reductions/reduce/2d`	`474500` ns	`670583` ns	`0.71`
`shared array/reductions/mapreduce/1d`	`371250` ns	`734062.5` ns	`0.51`
`shared array/reductions/mapreduce/2d`	`478292` ns	`670875` ns	`0.71`
`shared array/permutedims/4d`	`1444000` ns	`2547458.5` ns	`0.57`
`shared array/permutedims/2d`	`747125` ns	`1023687` ns	`0.73`
`shared array/permutedims/3d`	`1119000` ns	`1588812.5` ns	`0.70`
`shared array/copy`	`241729.5` ns	`238750` ns	`1.01`

This comment was automatically generated by workflow using github-action-benchmark.

maleadt · 2025-03-11T19:17:11Z

The Metal constants seem to have the same values as RoundNearest

What do you mean with the Metal constants? As coincidentally being discussed in JuliaGPU/CUDA.jl#2644 (comment), I guess it's better to be consistent with other Julia code rather than with Metal C.

christiangnrd · 2025-03-12T02:45:20Z

Latest push makes the Metal behaviour the same as cpu behaviour at least for comparisons.

maleadt · 2025-03-12T05:54:55Z

src/device/intrinsics/math.jl

+### Constants
+# π
+@device_override Core.Float32(::typeof(π), ::RoundingMode) = reinterpret(Float32, 0x40490fdb)        # 3.1415927f0 reinterpret(UInt32,Float32(reinterpret(Float64,0x400921FB60000000)))
+@device_override Core.Float32(::typeof(π), ::RoundingMode{:Down}) = reinterpret(Float32, 0x40490fda) # 3.1415925f0 prevfloat(reinterpret(UInt32,Float32(reinterpret(Float64,0x400921FB60000000))))
+@device_override Core.Float16(::typeof(π), ::RoundingMode{:Up}) = reinterpret(Float16, 0x4249)       # Float16(3.143)
+@device_override Core.Float16(::typeof(π), ::RoundingMode) = reinterpret(Float16, 0x4248)            # Float16(3.14)
+
+# ℯ
+@device_override Core.Float32(::typeof(ℯ), ::RoundingMode{:Up}) = reinterpret(Float32, 0x402df855)   # 2.718282f0 nextfloat(reinterpret(UInt32,Float32(reinterpret(Float64,0x4005BF0A80000000))))
+@device_override Core.Float32(::typeof(ℯ), ::RoundingMode) = reinterpret(Float32, 0x402df854)        # 2.7182817f0 reinterpret(UInt32,Float32(reinterpret(Float64,0x4005BF0A80000000)))
+@device_override Core.Float16(::typeof(ℯ), ::RoundingMode) = reinterpret(Float16, 0x4170)            # Float16(2.719)
+@device_override Core.Float16(::typeof(ℯ), ::RoundingMode{:Down}) = reinterpret(Float16, 0x416f)     # Float16(2.717)


Is it possible to generate those definitions with some metaprogramming, computing the constants on the fly, instead of hard-coding them?

The best I could come up with (includes a definition for cpu that woudn't make it to the PR:

macro _const_convert(irr,T,r) :($T($irr, $r)) end for T in (:Float32, :Float16), irr in (:π, :ℯ), r in (:RoundUp, :RoundDown) @eval begin @device_override $T(::typeof($irr), ::typeof($r)) = @_const_convert($irr, $T, $r) end end

And while maybe not the best approach, the @code_llvm for the CPU is:

; Function Signature: newFloat32(Base.Irrational{:π}, Base.Rounding.RoundingMode{:Up}) ; @ REPL[9]:4 within `newFloat32` define float @julia_newFloat32_6871() #0 { top: ret float 0x400921FB60000000 }

But when I try to run it, I get a GPUCompiler error:

julia> @device_code_llvm @metal convert_test_32(res_32) ; GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}(MethodInstance for convert_test_32(::MtlDeviceVector{Bool, 1}), CompilerConfig for GPUCompiler.MetalCompilerTarget, 0x0000000000006877) ERROR: old function still has uses (via a constant expr) Stacktrace: [1] error(s::String) @ Base ./error.jl:35 [2] add_global_address_spaces!(job::GPUCompiler.CompilerJob, mod::LLVM.Module, entry::LLVM.Function) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/metal.jl:414 [3] finish_ir!(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget}, mod::LLVM.Module, entry::LLVM.Function) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/metal.jl:166 [4] finish_ir!(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, mod::LLVM.Module, entry::LLVM.Function) @ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:14 [5] macro expansion @ ~/.julia/dev/GPUCompiler/src/driver.jl:284 [inlined] [6] emit_llvm(job::GPUCompiler.CompilerJob; kwargs::@Kwargs{}) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/utils.jl:110 [7] emit_llvm(job::GPUCompiler.CompilerJob) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/utils.jl:108 [8] compile_unhooked(output::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{}) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:95 [9] compile_unhooked @ ~/.julia/dev/GPUCompiler/src/driver.jl:80 [inlined] [10] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{}) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:67 [11] compile @ ~/.julia/dev/GPUCompiler/src/driver.jl:55 [inlined] [12] (::GPUCompiler.var"#235#236"{Bool, Symbol, Bool, GPUCompiler.CompilerJob{…}, GPUCompiler.CompilerConfig{…}})(ctx::Context) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:191 [13] JuliaContext(f::GPUCompiler.var"#235#236"{Bool, Symbol, Bool, GPUCompiler.CompilerJob{…}, GPUCompiler.CompilerConfig{…}}; kwargs::@Kwargs{}) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:34 [14] JuliaContext(f::Function) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:25 [15] code_llvm(io::Base.TTY, job::GPUCompiler.CompilerJob; optimize::Bool, raw::Bool, debuginfo::Symbol, dump_module::Bool, kwargs::@Kwargs{}) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:190 [16] code_llvm @ ~/.julia/dev/GPUCompiler/src/reflection.jl:186 [inlined] [17] (::GPUCompiler.var"#hook#246"{GPUCompiler.var"#hook#245#247"})(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}; io::Base.TTY, kwargs::@Kwargs{}) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:337 [18] (::GPUCompiler.var"#hook#246"{GPUCompiler.var"#hook#245#247"})(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:335 [19] var"#3#outer_hook"(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}) @ Main ~/.julia/dev/GPUCompiler/src/reflection.jl:246 [20] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{}) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:64 [21] compile @ ~/.julia/dev/GPUCompiler/src/driver.jl:55 [inlined] [22] (::Metal.var"#155#163"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})(ctx::Context) @ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:108 [23] JuliaContext(f::Metal.var"#155#163"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}}; kwargs::@Kwargs{}) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:34 [24] JuliaContext(f::Function) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:25 [25] macro expansion @ ~/.julia/dev/Metal/src/compiler/compilation.jl:107 [inlined] [26] macro expansion @ ~/.julia/packages/ObjectiveC/TgrW6/src/os.jl:264 [inlined] [27] compile(job::GPUCompiler.CompilerJob) @ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:105 [28] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(Metal.compile), linker::typeof(Metal.link)) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/execution.jl:245 [29] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, compiler::Function, linker::Function) @ GPUCompiler ~/.julia/dev/GPUCompiler/src/execution.jl:159 [30] macro expansion @ ~/.julia/dev/Metal/src/compiler/execution.jl:189 [inlined] [31] macro expansion @ ./lock.jl:273 [inlined] [32] mtlfunction(f::typeof(convert_test_32), tt::Type{Tuple{MtlDeviceVector{Bool, 1}}}; name::Nothing, kwargs::@Kwargs{}) @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:184 [33] mtlfunction(f::typeof(convert_test_32), tt::Type{Tuple{MtlDeviceVector{Bool, 1}}}) @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:182 [34] macro expansion @ ~/.julia/dev/Metal/src/compiler/execution.jl:85 [inlined] [35] top-level scope @ ~/.julia/dev/GPUCompiler/src/reflection.jl:257 [36] top-level scope @ ~/.julia/dev/Metal/src/initialization.jl:79 Some type information was truncated. Use `show(err)` to see complete types.

While writing this I also tried:

for T in (:Float32, :Float16), irr in (:π, :ℯ), r in (:RoundUp, :RoundDown) @eval begin @device_override $T(::typeof($irr), ::typeof($r)) = Base.Rounding._convert_rounding($T, $irr, $r) end end

But that also gives the "constant expression still has uses" error

This errors in some really weird ways when running tests in CI and locally, so here's a copy-paste code snippet:

using Metal; begin function convert_test_32(res) res[1] = Float32(irr,RoundDown) < irr res[2] = Float32(irr,RoundUp) > irr res[3] = !(Float32(irr,RoundDown) > irr) res[4] = !(Float32(irr,RoundUp) < irr) return nothing end res_32 = MtlArray(zeros(Bool,4)) Metal.@sync @metal convert_test_32(res_32) end

github-actions bot reviewed Mar 4, 2025

View reviewed changes

christiangnrd force-pushed the irrational branch from e73fd42 to 3bddc3d Compare March 10, 2025 17:43

pi and e to Float32 and Float16

198feed

christiangnrd force-pushed the irrational branch from 3bddc3d to 198feed Compare March 11, 2025 18:51

Mimic Julia behaviour

9d94119

maleadt reviewed Mar 12, 2025

View reviewed changes

christiangnrd added 2 commits March 12, 2025 13:09

jdyfag

fd2c10a

fsghrgs

df7aae2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Pi` and `e` to `Float32` and `Float16` #559

`Pi` and `e` to `Float32` and `Float16` #559

christiangnrd commented Mar 4, 2025

github-actions bot commented Mar 4, 2025 •

edited

Loading

christiangnrd commented Mar 4, 2025

github-actions bot left a comment •

edited

Loading

maleadt commented Mar 11, 2025

christiangnrd commented Mar 12, 2025

maleadt Mar 12, 2025

christiangnrd Mar 12, 2025

christiangnrd Mar 12, 2025 •

edited

Loading

Pi and e to Float32 and Float16 #559

Are you sure you want to change the base?

Pi and e to Float32 and Float16 #559

Conversation

christiangnrd commented Mar 4, 2025

github-actions bot commented Mar 4, 2025 • edited Loading

christiangnrd commented Mar 4, 2025

github-actions bot left a comment • edited Loading

Choose a reason for hiding this comment

Metal Benchmarks

maleadt commented Mar 11, 2025

christiangnrd commented Mar 12, 2025

maleadt Mar 12, 2025

Choose a reason for hiding this comment

christiangnrd Mar 12, 2025

Choose a reason for hiding this comment

christiangnrd Mar 12, 2025 • edited Loading

Choose a reason for hiding this comment

`Pi` and `e` to `Float32` and `Float16` #559

`Pi` and `e` to `Float32` and `Float16` #559

github-actions bot commented Mar 4, 2025 •

edited

Loading

github-actions bot left a comment •

edited

Loading

christiangnrd Mar 12, 2025 •

edited

Loading