You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 17, 2020. It is now read-only.
Recently I ran into a surprising (for me) behavior demonstrated by the MWE below
using CuArrays, CUDAnative, GPUifyLoops
function kernel(rho, T)
P = rho[1] * T[1]
if (abs(P - P) > 1e-16)
@cuprintf("diff = %.16e\n", P - P)
end
nothing
end
rho = CuArray([1e-1])
T = CuArray([300.0])
@launch CUDA() kernel(rho, T, threads=1, blocks=1)
with the output
diff = 1.6653345369377348e-15
Basically, if my understanding of the generated PTX is correct, what happens
is that P - P is calculated as fma(rho[1], T[1], -P) which is probably not the smartest move by the compiler. However, clang with LLVM-6.0.1 also does this for CUDA C so I guess that's expected. This issue goes away if I disable contracts. In clang there's an option for that called -ffp-contract Maybe adding a similar option in GPUifyLoops would be helpful for debugging ?
For convenience, the generated PTX can be found here:
Thanks for bringing this up, the goal in #55 was indeed to match Clang (we were hunting down a performance gap).
I agree that the fact that we use contract unconditionally is probably not what we want in the long-term. Julia in general tries to provide localised control to the user (compare @fastmath).
Recently I ran into a surprising (for me) behavior demonstrated by the MWE below
with the output
Basically, if my understanding of the generated PTX is correct, what happens
is that
P - P
is calculated asfma(rho[1], T[1], -P)
which is probably not the smartest move by the compiler. However, clang with LLVM-6.0.1 also does this for CUDA C so I guess that's expected. This issue goes away if I disable contracts. In clang there's an option for that called-ffp-contract
Maybe adding a similar option in GPUifyLoops would be helpful for debugging ?For convenience, the generated PTX can be found here:
https://gist.github.com/mwarusz/5ab4ac99b02e77b54178cd95c9820d7b
The text was updated successfully, but these errors were encountered: