-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document @simd
and remove invalid uses
#27495
Changes from all commits
2c4f59a
dd88c6a
bff1c48
c0e785c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -85,6 +85,36 @@ function compile(x) | |
end | ||
end | ||
|
||
""" | ||
@simd | ||
|
||
Annotate a `for` loop to allow the compiler to take extra liberties to allow loop re-ordering | ||
|
||
!!!warning | ||
This feature is experimental and could change or disappear in future versions of Julia. | ||
Incorrect use of the `@simd` macro may cause unexpected results. | ||
|
||
The object iterated over in a `@simd for` loop should be a one-dimensional range or a CartesianIndices iterator. | ||
By using `@simd`, you are asserting several properties of the loop: | ||
|
||
* It is safe to execute iterations in arbitrary or overlapping order, with special consideration for reduction variables. | ||
* Floating-point operations on reduction variables can be reordered, possibly causing different results than without `@simd`. | ||
* No iteration ever waits on a previous iteration to make forward progress. | ||
|
||
In many cases, Julia is able to automatically vectorize inner for loops without the use of `@simd`. | ||
Using `@simd` gives the compiler a little extra leeway to make it possible in more situations. In | ||
either case, your inner loop should have the following properties to allow vectorization: | ||
|
||
* The loop must be an innermost loop | ||
* The loop body must be straight-line code. Therefore, [`@inbounds`](@ref) is | ||
currently needed for all array accesses. The compiler can sometimes turn | ||
short `&&`, `||`, and `?:` expressions into straight-line code if it is safe | ||
to evaluate all operands unconditionally. Consider using the [`ifelse`](@ref) | ||
function instead of `?:` in the loop if it is safe to do so. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general, this would be bad advice, since it's harder to generate fast code for |
||
* Accesses must have a stride pattern and cannot be "gathers" (random-index | ||
reads) or "scatters" (random-index writes). | ||
* The stride should be unit stride. | ||
""" | ||
macro simd(forloop) | ||
esc(compile(forloop)) | ||
end | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -219,6 +219,7 @@ Base.gensym | |
Base.@gensym | ||
Base.@goto | ||
Base.@label | ||
Base.@simd | ||
Base.@polly | ||
``` | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that sure that this is meaningful. The question is when is a
function
not valid to be executed as simd?When it has side-effects?
@simd
is an advise to LLVM that gives it permissions to perform some limited transformations and encourages it to be more lenient, but LLVM still has to proof that the simd transformation is valid and beneficial.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a poor name. I don't really want to be building a trait-system here — I wanted to just use a hard-coded
Union{}
— but due to bootstrapping issues not all the functions I wanted to whitelist were available. Of course we can not guarantee that these functions won't have an invalid method on them.Really, the problem is that
@simd
is non-local. It affects any function that gets inlined — even passed user functions. We have no idea if the particular method is going to have observable side-effects that break the SIMD guarantees. This is an attempt at a white-list to recoup some of the performance in common cases — but perhaps we shouldn't even try to do that.My understanding is that
@simd
is promising that the transformation is valid — allowing LLVM to do things that it cannot prove.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially true, but the list above is largely redundant with things that LLVM should be able to prove (since it's approximately just a subset list of known pure functions), and, beyond that, is incorrect in general (there's no guarantee – or expectation – that my implementation of any of those functions is actually pure).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It defines that certain transforms are valid, but we implement that list. Currently it contains:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, great, that is extremely helpful. So if we disable the memory independence assertion (b68efd3), then is it correct that one safely could annotate
@simd
over unknown generic functions? My read is yes: this simply adds extra handling for reduction variables — which, by definition, must be lexically visible to the@simd
author.Memory dependencies, on the other hand, might not be lexically visible to the
@simd
author. Doing something like #19658 could perhaps allow us to recoup some of the lost perf in the future in a much safer and precise way.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my understanding too. Although there's some non-locality in the definition of what constitutes "computation of the reduction", since we just examine the use/def graph after optimization. It is possible to make a closure which obscures the reduction variable lexically. For example:
This pattern appears, for example, in the
Stateful
iterator.