-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pi
and e
to Float32
and Float16
#559
base: main
Are you sure you want to change the base?
Conversation
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/device/intrinsics/math.jl b/test/device/intrinsics/math.jl
index de7ab2f4..34095e3b 100644
--- a/test/device/intrinsics/math.jl
+++ b/test/device/intrinsics/math.jl
@@ -312,32 +312,32 @@ end
@test occursin(Regex("@air\\.sign\\.f$(8*sizeof(T))"), ir)
end
- # Borrowed from the Julia "Irrationals compared with Rationals and Floats" testset
- @testset "Comparisons with $irr" for irr in (π, ℯ)
- function convert_test_32(res)
- res[1] = Float32(irr,RoundDown) < irr
- res[2] = Float32(irr,RoundUp) > irr
- res[3] = !(Float32(irr,RoundDown) > irr)
- res[4] = !(Float32(irr,RoundUp) < irr)
- return nothing
+ # Borrowed from the Julia "Irrationals compared with Rationals and Floats" testset
+ @testset "Comparisons with $irr" for irr in (π, ℯ)
+ function convert_test_32(res)
+ res[1] = Float32(irr, RoundDown) < irr
+ res[2] = Float32(irr, RoundUp) > irr
+ res[3] = !(Float32(irr, RoundDown) > irr)
+ res[4] = !(Float32(irr, RoundUp) < irr)
+ return nothing
+ end
+
+ res_32 = MtlArray(zeros(Bool, 4))
+ Metal.@sync @metal convert_test_32(res_32)
+ @test all(Array(res_32))
+
+ function convert_test_16(res)
+ res[1] = Float16(irr, RoundDown) < irr
+ res[2] = Float16(irr, RoundUp) > irr
+ res[3] = !(Float16(irr, RoundDown) > irr)
+ res[4] = !(Float16(irr, RoundUp) < irr)
+ return nothing
+ end
+
+ res_16 = MtlArray(zeros(Bool, 4))
+ Metal.@sync @metal convert_test_16(res_16)
+ @test all(Array(res_16))
end
-
- res_32 = MtlArray(zeros(Bool,4))
- Metal.@sync @metal convert_test_32(res_32)
- @test all(Array(res_32))
-
- function convert_test_16(res)
- res[1] = Float16(irr,RoundDown) < irr
- res[2] = Float16(irr,RoundUp) > irr
- res[3] = !(Float16(irr,RoundDown) > irr)
- res[4] = !(Float16(irr,RoundUp) < irr)
- return nothing
- end
-
- res_16 = MtlArray(zeros(Bool,4))
- Metal.@sync @metal convert_test_16(res_16)
- @test all(Array(res_16))
- end
end
end
|
The Metal constants seem to have the same values as
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metal Benchmarks
Benchmark suite | Current: df7aae2 | Previous: 4903c64 | Ratio |
---|---|---|---|
private array/construct |
26923.666666666668 ns |
26097.25 ns |
1.03 |
private array/broadcast |
262562.5 ns |
464250 ns |
0.57 |
private array/random/randn/Float32 |
499666 ns |
802521.5 ns |
0.62 |
private array/random/randn!/Float32 |
486083 ns |
634417 ns |
0.77 |
private array/random/rand!/Int64 |
455749.5 ns |
555166.5 ns |
0.82 |
private array/random/rand!/Float32 |
358333 ns |
593917 ns |
0.60 |
private array/random/rand/Int64 |
694875 ns |
786666.5 ns |
0.88 |
private array/random/rand/Float32 |
377292 ns |
607291.5 ns |
0.62 |
private array/copyto!/gpu_to_gpu |
227208 ns |
651959 ns |
0.35 |
private array/copyto!/cpu_to_gpu |
253417 ns |
816583 ns |
0.31 |
private array/copyto!/gpu_to_cpu |
252500 ns |
695209 ns |
0.36 |
private array/accumulate/1d |
7986834 ns |
1337584 ns |
5.97 |
private array/accumulate/2d |
961916 ns |
1415584 ns |
0.68 |
private array/iteration/findall/int |
7972541.5 ns |
2090292 ns |
3.81 |
private array/iteration/findall/bool |
7985875 ns |
1820792 ns |
4.39 |
private array/iteration/findfirst/int |
1161312 ns |
1682334 ns |
0.69 |
private array/iteration/findfirst/bool |
1150166 ns |
1668062.5 ns |
0.69 |
private array/iteration/scalar |
1524271 ns |
3837708 ns |
0.40 |
private array/iteration/logical |
8099917 ns |
3204958.5 ns |
2.53 |
private array/iteration/findmin/1d |
1178667 ns |
1767791.5 ns |
0.67 |
private array/iteration/findmin/2d |
890083 ns |
1357208 ns |
0.66 |
private array/reductions/reduce/1d |
462459 ns |
1034666.5 ns |
0.45 |
private array/reductions/reduce/2d |
470292 ns |
666375 ns |
0.71 |
private array/reductions/mapreduce/1d |
480792 ns |
1037104 ns |
0.46 |
private array/reductions/mapreduce/2d |
462916.5 ns |
668667 ns |
0.69 |
private array/permutedims/4d |
1440458 ns |
2542895.5 ns |
0.57 |
private array/permutedims/2d |
748666 ns |
1024354.5 ns |
0.73 |
private array/permutedims/3d |
1124604 ns |
1585750 ns |
0.71 |
private array/copy |
348041.5 ns |
618375 ns |
0.56 |
latency/precompile |
9080074000 ns |
9065026583 ns |
1.00 |
latency/ttfp |
3633120458 ns |
3618215084 ns |
1.00 |
latency/import |
1243107375 ns |
1245634417 ns |
1.00 |
integration/metaldevrt |
531458 ns |
715042 ns |
0.74 |
integration/byval/slices=1 |
1587937 ns |
1564979.5 ns |
1.01 |
integration/byval/slices=3 |
10447709 ns |
10783271 ns |
0.97 |
integration/byval/reference |
1483209 ns |
1534875 ns |
0.97 |
integration/byval/slices=2 |
2461354.5 ns |
2619125 ns |
0.94 |
kernel/indexing |
240458 ns |
468687.5 ns |
0.51 |
kernel/indexing_checked |
237895.5 ns |
468687 ns |
0.51 |
kernel/launch |
50916.666666666664 ns |
9437.666666666666 ns |
5.40 |
metal/synchronization/stream |
14250 ns |
14125 ns |
1.01 |
metal/synchronization/context |
14750 ns |
14708 ns |
1.00 |
shared array/construct |
26416.666666666668 ns |
24409.75 ns |
1.08 |
shared array/broadcast |
253916.5 ns |
460208 ns |
0.55 |
shared array/random/randn/Float32 |
503146 ns |
880875 ns |
0.57 |
shared array/random/randn!/Float32 |
419083.5 ns |
636250 ns |
0.66 |
shared array/random/rand!/Int64 |
430020.5 ns |
551291 ns |
0.78 |
shared array/random/rand!/Float32 |
411750 ns |
594125 ns |
0.69 |
shared array/random/rand/Int64 |
715750 ns |
789250 ns |
0.91 |
shared array/random/rand/Float32 |
342500 ns |
634542 ns |
0.54 |
shared array/copyto!/gpu_to_gpu |
85542 ns |
83625 ns |
1.02 |
shared array/copyto!/cpu_to_gpu |
82000 ns |
83041 ns |
0.99 |
shared array/copyto!/gpu_to_cpu |
84375 ns |
82250 ns |
1.03 |
shared array/accumulate/1d |
7989271 ns |
1340667 ns |
5.96 |
shared array/accumulate/2d |
961625 ns |
1394125 ns |
0.69 |
shared array/iteration/findall/int |
7976667 ns |
1845750 ns |
4.32 |
shared array/iteration/findall/bool |
7989229.5 ns |
1576917 ns |
5.07 |
shared array/iteration/findfirst/int |
940166.5 ns |
1392979.5 ns |
0.67 |
shared array/iteration/findfirst/bool |
925708 ns |
1375687.5 ns |
0.67 |
shared array/iteration/scalar |
153542 ns |
153000 ns |
1.00 |
shared array/iteration/logical |
8054708 ns |
2990542 ns |
2.69 |
shared array/iteration/findmin/1d |
976625 ns |
1483333.5 ns |
0.66 |
shared array/iteration/findmin/2d |
895709 ns |
1364208.5 ns |
0.66 |
shared array/reductions/reduce/1d |
377583 ns |
730667 ns |
0.52 |
shared array/reductions/reduce/2d |
474500 ns |
670583 ns |
0.71 |
shared array/reductions/mapreduce/1d |
371250 ns |
734062.5 ns |
0.51 |
shared array/reductions/mapreduce/2d |
478292 ns |
670875 ns |
0.71 |
shared array/permutedims/4d |
1444000 ns |
2547458.5 ns |
0.57 |
shared array/permutedims/2d |
747125 ns |
1023687 ns |
0.73 |
shared array/permutedims/3d |
1119000 ns |
1588812.5 ns |
0.70 |
shared array/copy |
241729.5 ns |
238750 ns |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
e73fd42
to
3bddc3d
Compare
3bddc3d
to
198feed
Compare
What do you mean with the Metal constants? As coincidentally being discussed in JuliaGPU/CUDA.jl#2644 (comment), I guess it's better to be consistent with other Julia code rather than with Metal C. |
Latest push makes the Metal behaviour the same as cpu behaviour at least for comparisons. |
src/device/intrinsics/math.jl
Outdated
### Constants | ||
# π | ||
@device_override Core.Float32(::typeof(π), ::RoundingMode) = reinterpret(Float32, 0x40490fdb) # 3.1415927f0 reinterpret(UInt32,Float32(reinterpret(Float64,0x400921FB60000000))) | ||
@device_override Core.Float32(::typeof(π), ::RoundingMode{:Down}) = reinterpret(Float32, 0x40490fda) # 3.1415925f0 prevfloat(reinterpret(UInt32,Float32(reinterpret(Float64,0x400921FB60000000)))) | ||
@device_override Core.Float16(::typeof(π), ::RoundingMode{:Up}) = reinterpret(Float16, 0x4249) # Float16(3.143) | ||
@device_override Core.Float16(::typeof(π), ::RoundingMode) = reinterpret(Float16, 0x4248) # Float16(3.14) | ||
|
||
# ℯ | ||
@device_override Core.Float32(::typeof(ℯ), ::RoundingMode{:Up}) = reinterpret(Float32, 0x402df855) # 2.718282f0 nextfloat(reinterpret(UInt32,Float32(reinterpret(Float64,0x4005BF0A80000000)))) | ||
@device_override Core.Float32(::typeof(ℯ), ::RoundingMode) = reinterpret(Float32, 0x402df854) # 2.7182817f0 reinterpret(UInt32,Float32(reinterpret(Float64,0x4005BF0A80000000))) | ||
@device_override Core.Float16(::typeof(ℯ), ::RoundingMode) = reinterpret(Float16, 0x4170) # Float16(2.719) | ||
@device_override Core.Float16(::typeof(ℯ), ::RoundingMode{:Down}) = reinterpret(Float16, 0x416f) # Float16(2.717) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to generate those definitions with some metaprogramming, computing the constants on the fly, instead of hard-coding them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best I could come up with (includes a definition for cpu that woudn't make it to the PR:
macro _const_convert(irr,T,r)
:($T($irr, $r))
end
for T in (:Float32, :Float16), irr in (:π, :ℯ), r in (:RoundUp, :RoundDown)
@eval begin
@device_override $T(::typeof($irr), ::typeof($r)) = @_const_convert($irr, $T, $r)
end
end
And while maybe not the best approach, the @code_llvm for the CPU is:
; Function Signature: newFloat32(Base.Irrational{:π}, Base.Rounding.RoundingMode{:Up})
; @ REPL[9]:4 within `newFloat32`
define float @julia_newFloat32_6871() #0 {
top:
ret float 0x400921FB60000000
}
But when I try to run it, I get a GPUCompiler error:
julia> @device_code_llvm @metal convert_test_32(res_32)
; GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}(MethodInstance for convert_test_32(::MtlDeviceVector{Bool, 1}), CompilerConfig for GPUCompiler.MetalCompilerTarget, 0x0000000000006877)
ERROR: old function still has uses (via a constant expr)
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] add_global_address_spaces!(job::GPUCompiler.CompilerJob, mod::LLVM.Module, entry::LLVM.Function)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/metal.jl:414
[3] finish_ir!(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget}, mod::LLVM.Module, entry::LLVM.Function)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/metal.jl:166
[4] finish_ir!(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, mod::LLVM.Module, entry::LLVM.Function)
@ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:14
[5] macro expansion
@ ~/.julia/dev/GPUCompiler/src/driver.jl:284 [inlined]
[6] emit_llvm(job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/utils.jl:110
[7] emit_llvm(job::GPUCompiler.CompilerJob)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/utils.jl:108
[8] compile_unhooked(output::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:95
[9] compile_unhooked
@ ~/.julia/dev/GPUCompiler/src/driver.jl:80 [inlined]
[10] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:67
[11] compile
@ ~/.julia/dev/GPUCompiler/src/driver.jl:55 [inlined]
[12] (::GPUCompiler.var"#235#236"{Bool, Symbol, Bool, GPUCompiler.CompilerJob{…}, GPUCompiler.CompilerConfig{…}})(ctx::Context)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:191
[13] JuliaContext(f::GPUCompiler.var"#235#236"{Bool, Symbol, Bool, GPUCompiler.CompilerJob{…}, GPUCompiler.CompilerConfig{…}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:34
[14] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:25
[15] code_llvm(io::Base.TTY, job::GPUCompiler.CompilerJob; optimize::Bool, raw::Bool, debuginfo::Symbol, dump_module::Bool, kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:190
[16] code_llvm
@ ~/.julia/dev/GPUCompiler/src/reflection.jl:186 [inlined]
[17] (::GPUCompiler.var"#hook#246"{GPUCompiler.var"#hook#245#247"})(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}; io::Base.TTY, kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:337
[18] (::GPUCompiler.var"#hook#246"{GPUCompiler.var"#hook#245#247"})(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/reflection.jl:335
[19] var"#3#outer_hook"(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams})
@ Main ~/.julia/dev/GPUCompiler/src/reflection.jl:246
[20] compile(target::Symbol, job::GPUCompiler.CompilerJob; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:64
[21] compile
@ ~/.julia/dev/GPUCompiler/src/driver.jl:55 [inlined]
[22] (::Metal.var"#155#163"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})(ctx::Context)
@ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:108
[23] JuliaContext(f::Metal.var"#155#163"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}}; kwargs::@Kwargs{})
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:34
[24] JuliaContext(f::Function)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/driver.jl:25
[25] macro expansion
@ ~/.julia/dev/Metal/src/compiler/compilation.jl:107 [inlined]
[26] macro expansion
@ ~/.julia/packages/ObjectiveC/TgrW6/src/os.jl:264 [inlined]
[27] compile(job::GPUCompiler.CompilerJob)
@ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:105
[28] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/execution.jl:245
[29] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/dev/GPUCompiler/src/execution.jl:159
[30] macro expansion
@ ~/.julia/dev/Metal/src/compiler/execution.jl:189 [inlined]
[31] macro expansion
@ ./lock.jl:273 [inlined]
[32] mtlfunction(f::typeof(convert_test_32), tt::Type{Tuple{MtlDeviceVector{Bool, 1}}}; name::Nothing, kwargs::@Kwargs{})
@ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:184
[33] mtlfunction(f::typeof(convert_test_32), tt::Type{Tuple{MtlDeviceVector{Bool, 1}}})
@ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:182
[34] macro expansion
@ ~/.julia/dev/Metal/src/compiler/execution.jl:85 [inlined]
[35] top-level scope
@ ~/.julia/dev/GPUCompiler/src/reflection.jl:257
[36] top-level scope
@ ~/.julia/dev/Metal/src/initialization.jl:79
Some type information was truncated. Use `show(err)` to see complete types.
While writing this I also tried:
for T in (:Float32, :Float16), irr in (:π, :ℯ), r in (:RoundUp, :RoundDown)
@eval begin
@device_override $T(::typeof($irr), ::typeof($r)) = Base.Rounding._convert_rounding($T, $irr, $r)
end
end
But that also gives the "constant expression still has uses" error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This errors in some really weird ways when running tests in CI and locally, so here's a copy-paste code snippet:
using Metal; begin
function convert_test_32(res)
res[1] = Float32(irr,RoundDown) < irr
res[2] = Float32(irr,RoundUp) > irr
res[3] = !(Float32(irr,RoundDown) > irr)
res[4] = !(Float32(irr,RoundUp) < irr)
return nothing
end
res_32 = MtlArray(zeros(Bool,4))
Metal.@sync @metal convert_test_32(res_32)
end
Bypasses the conversion to
BigFloat
when convertingpi
ande
toFloat32
andFloat16
on gpu. Values are taken from the constants in Tables 6.5 and 6.6 of the Metal shading language specification.Close #551