Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have mapreduce dispatch on Returns? #42333

Open
olof3 opened this issue Sep 21, 2021 · 7 comments
Open

Have mapreduce dispatch on Returns? #42333

olof3 opened this issue Sep 21, 2021 · 7 comments
Labels
fold sum, maximum, reduce, foldl, etc. performance Must go faster

Comments

@olof3
Copy link
Contributor

olof3 commented Sep 21, 2021

The type Returns was added in #39794 . For example, if sum(f, x) is used for some user-specified regularization penalty on x, then f = Returns(0) could be used to conveniently "turn off" that regularization

It seems neat to have mapreduce dispatch on Returns, e.g.,

Base.mapreduce(f::Returns, ::typeof(Base.add_sum), a::Union{Base.AbstractBroadcasted, AbstractArray}) = f.value * length(a)
julia> @btime sum(Returns(0), 1:1000_000_000)
  0.021 ns (0 allocations: 0 bytes)
0

Any thoughts?

(Note that mapreduce is surprisingly fast as it is, for the case above it was 5 ms (?!) without specialized dispatch. Btw, I*m a bit curious how it can be that fast.)

@KristofferC
Copy link
Member

I don't think this will give you the same result for many inputs. It's also getting a bit close for comfort imho of turning Julia into a CAS. As soon as you introduce something "algebraically" like Returns there is a huge space for various specializations and algebraic simplifications. It has to be carefully weighed if all the extra methods, extra compilation cost etc are actually valuable in practice for normal code.

@olof3
Copy link
Contributor Author

olof3 commented Sep 21, 2021

I don't think this will give you the same result for many inputs.

I'm not sure what you mean. It should work for any Returns and AbstractArray?

It has to be carefully weighed if all the extra methods, extra compilation cost etc are actually valuable in practice for normal code.

Yes, that weighting is non trivial. In terms of compilation cost it does not seem worse, slightly better if anything. But I agree that general bloat is an issue.

@tkf
Copy link
Member

tkf commented Sep 22, 2021

This requires + to form, say, a module over a ring that can embeds Int. Our specification of + and * are simply not rigours enough to justify such transformation.

I also agree with Kristoffer that adding this to Base requires careful balancing and my hunch is also it's probably not worth the effort. That said, if we can come up with a good definition of + interface and a good dispatch pipeline to resolve method ambiguities robustly, perhaps a good option is a type-pirating "performance booster package" that contains a bunch of optimized specializations maintained by the community, outside of Julia Base. (Kind of like Compat.jl but for performance.)

@olof3
Copy link
Contributor Author

olof3 commented Sep 22, 2021

This requires + to form, say, a module over a ring that can embeds Int. Our specification of + and * are simply not rigours enough to justify such transformation.

You're right, there needs to be some restriction of the type of f, perhaps something like f::Returns{<:Union{Number, AbstractArray{<:Number}}}? I guess that would cover a large majority of the realistic cases.

I also agree with Kristoffer that adding this to Base requires careful balancing and my hunch is also it's probably not worth the effort.

Based on the amount of specialization I've come across in Base, my feeling was that this was reasonable. But given that mapreduce(f, +, a) with a scalar-valued f is as fast as it is, it is certainly valid question if it is worth it. It's not so fast for a Matrix valued f, though.

I sort of expected this to be have been implemented already. Dispatching on Returns was one of the selling points in #39794, and this seems like one of the more clear cut places for doing that. Handling only the + case seems sufficient (and not *, min, max).

@KristofferC
Copy link
Member

Dispatching on Returns was one of the selling points in #39794

IMO the selling point is reduced compilation as mentioned in #39794 (comment). I don't think heavily dispatch on Returns is a good idea because it breaks "transparency" in the sense that it is imo desirable to want e.g. identity and x -> x as well as Returns(x) and () -> x to work the same. I know there are already cases where this is not true in Julia but it is something to strive for.

@JeffBezanson
Copy link
Member

JeffBezanson commented Sep 22, 2021

Right; dispatching on Returns might be ok, but each case has to be weighed according to whether it is correct and/or useful. For methods that are just optimizations, it's very much preferred that they give the exact same result. In this case, there is some hope that we could automatically optimize out the loop when it is valid, e.g.

function mysum(f, x)
    @inbounds s = f(x[1])
    for i = 2:length(x)
        @inbounds s += f(x[i])
    end
    return s
end

julia> @code_llvm debuginfo=:none mysum(Returns(0), 1:1000000)
define i64 @julia_mysum_209([1 x i64]* nocapture nonnull readonly align 8 dereferenceable(8) %0, [2 x i64]* nocapture nonnull readonly align 8 dereferenceable(16) %1) #0 {
top:
  %2 = getelementptr inbounds [1 x i64], [1 x i64]* %0, i64 0, i64 0
  %3 = getelementptr inbounds [2 x i64], [2 x i64]* %1, i64 0, i64 1
  %4 = getelementptr inbounds [2 x i64], [2 x i64]* %1, i64 0, i64 0
  %5 = load i64, i64* %3, align 8
  %6 = load i64, i64* %4, align 8
  %7 = sub i64 %5, %6
  %8 = add i64 %7, 1
  %.inv = icmp sgt i64 %8, 1
  %9 = load i64, i64* %2, align 8
  %10 = select i1 %.inv, i64 %8, i64 1
  %spec.select = mul i64 %9, %10
  ret i64 %spec.select
}

As you can see, no loop.

@olof3
Copy link
Contributor Author

olof3 commented Sep 24, 2021

As you can see, no loop.

Nice! Not sure what happens with the current mapreduced-based implementation of sum, it seems surprisingly fast, but not completely optimized out.

One could also consider if it should be possible to do sum(Returns(1), Int[]), or if it is not useful enough.

You guys have a better idea of the trade-off weighting. It seemed like a typical place for specialization, but as you say, it may not necessarily be a good idea.

@mbauman mbauman added performance Must go faster fold sum, maximum, reduce, foldl, etc. labels Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fold sum, maximum, reduce, foldl, etc. performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants