Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for per-module max_methods. #43370

Merged
merged 12 commits into from
Dec 17, 2021

Conversation

chriselrod
Copy link
Contributor

Adding

if isdefined(Base, :Experimental) && isdefined(Base.Experimental, Symbol("@max_methods"))
    @eval Base.Experimental.@max_methods 1
end

to Plots.jl:
Master:

julia> using Plots, SnoopCompile, BenchmarkTools

julia> tinf = @snoopi_deep plot(1:10, 1:10)
InferenceTimingNode: 2.085727/12.130569 on Core.Compiler.Timings.ROOT() with 98 direct children

julia> @benchmark plot(1:10, 1:10)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  171.533 μs    6.345 ms  ┊ GC (min  max): 0.00%  91.09%
 Time  (median):     177.000 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   187.454 μs ± 173.048 μs  ┊ GC (mean ± σ):  2.97% ±  3.13%

    █▆
  ▂▆██▆▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▂▁▂▁▁▁▁▂▁▂▂▁▁▁▁▂▁▁▁▁▁▁▂▁▂▁▁▁▁▁▁▁▁▁▁▁▂▂▃▂ ▂
  172 μs           Histogram: frequency by time          275 μs <

 Memory estimate: 63.12 KiB, allocs estimate: 976.

This PR:

julia> using Plots, SnoopCompile, BenchmarkTools

julia> tinf = @snoopi_deep plot(1:10, 1:10)
InferenceTimingNode: 1.709860/6.303342 on Core.Compiler.Timings.ROOT() with 109 direct children

julia> @benchmark plot(1:10, 1:10)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  226.586 μs    5.654 ms  ┊ GC (min  max): 0.00%  92.18%
 Time  (median):     232.976 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   243.641 μs ± 191.944 μs  ┊ GC (mean ± σ):  2.78% ±  3.39%

  ▂▇█▆▅▄▄▃▂▁                                                    ▂
  ███████████▆▅▅▅▅▄▄▄▄▅▄▃▄▄▁▅▁▁▁▄▄▄▁▄▁▁▅▃▄▃▁▁▃▄▁▁▄▄▃▅▃▄▃▃▃▄▅▅▇▇ █
  227 μs        Histogram: log(frequency) by time        378 μs <

 Memory estimate: 63.54 KiB, allocs estimate: 987.

Adding @max_methods 1 to OrindaryDiffEq, DiffEqBase, DifferentialEquations, VectorizationBase, LoopVectorization, TriangularSolve, and RecursiveFactorization, I get:
Master:

julia> using DifferentialEquations, SnoopCompile, BenchmarkTools

julia> lorenz = (du,u,p,t) -> begin
        du[1] = 10.0(u[2]-u[1])
        du[2] = u[1]*(28.0-u[3]) - u[2]
        du[3] = u[1]*u[2] - (8/3)*u[3]
       end
#1 (generic function with 1 method)

julia> u0 = [1.0;0.0;0.0]
3-element Vector{Float64}:
 1.0
 0.0
 0.0

julia> tspan = (0.0,100.0)
(0.0, 100.0)

julia> prob = ODEProblem(lorenz,u0,tspan)
ODEProblem with uType Vector{Float64} and tType Float64. In-place: true
timespan: (0.0, 100.0)
u0: 3-element Vector{Float64}:
 1.0
 0.0
 0.0

julia> alg = Rodas5()
Rodas5{0, true, DefaultLinSolve, Val{:forward}}(DefaultLinSolve(nothing, nothing, nothing))

julia> tinf = @snoopi_deep solve(prob,alg)
InferenceTimingNode: 4.206624/23.152936 on Core.Compiler.Timings.ROOT() with 6 direct children

julia> @benchmark solve($prob, $alg)
BenchmarkTools.Trial: 2299 samples with 1 evaluation.
 Range (min  max):  2.070 ms   10.632 ms  ┊ GC (min  max): 0.00%  78.31%
 Time  (median):     2.110 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.171 ms ± 646.150 μs  ┊ GC (mean ± σ):  2.27% ±  6.05%

         ▄▅█▇▅▄▁▁▁
  ▂▃▅▆▆▇▇██████████▆▅▄▃▃▃▃▃▃▃▂▂▂▂▁▂▂▁▁▂▂▃▄▃▄▃▃▃▂▃▃▃▂▂▂▂▂▂▂▂▁▂ ▃
  2.07 ms         Histogram: frequency by time        2.28 ms <

 Memory estimate: 666.94 KiB, allocs estimate: 8955.

julia> lorenz = (du,u,p,t) -> begin
        du[1] = 10.0(u[2]-u[1])
        du[2] = u[1]*(28.0-u[3]) - u[2]
        du[3] = u[1]*u[2] - (8/3)*u[3]
       end
#3 (generic function with 1 method)

julia> prob = ODEProblem(lorenz,u0,tspan)
ODEProblem with uType Vector{Float64} and tType Float64. In-place: true
timespan: (0.0, 100.0)
u0: 3-element Vector{Float64}:
 1.0
 0.0
 0.0

julia> tinf = @snoopi_deep solve(prob,alg)
InferenceTimingNode: 1.233243/2.736248 on Core.Compiler.Timings.ROOT() with 3 direct children

This PR:

julia> using DifferentialEquations, SnoopCompile, BenchmarkTools

julia> lorenz = (du,u,p,t) -> begin
        du[1] = 10.0(u[2]-u[1])
        du[2] = u[1]*(28.0-u[3]) - u[2]
        du[3] = u[1]*u[2] - (8/3)*u[3]
       end
#1 (generic function with 1 method)

julia> u0 = [1.0;0.0;0.0]
3-element Vector{Float64}:
 1.0
 0.0
 0.0

julia> tspan = (0.0,100.0)
(0.0, 100.0)

julia> prob = ODEProblem(lorenz,u0,tspan)
ODEProblem with uType Vector{Float64} and tType Float64. In-place: true
timespan: (0.0, 100.0)
u0: 3-element Vector{Float64}:
 1.0
 0.0
 0.0

julia> alg = Rodas5()
Rodas5{0, true, DefaultLinSolve, Val{:forward}}(DefaultLinSolve(nothing, nothing, nothing))

julia> tinf = @snoopi_deep solve(prob,alg)
InferenceTimingNode: 1.595283/3.539406 on Core.Compiler.Timings.ROOT() with 8 direct children

julia> @benchmark solve($prob, $alg)
BenchmarkTools.Trial: 2025 samples with 1 evaluation.
 Range (min  max):  2.343 ms   10.539 ms  ┊ GC (min  max): 0.00%  76.21%
 Time  (median):     2.400 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.465 ms ± 682.168 μs  ┊ GC (mean ± σ):  2.34% ±  6.45%

             ▁▄▅▆█▆▇▆▂▃
  ▂▁▂▂▂▃▃▃▄▅▇██████████▇▇▅▅▅▄▄▃▃▃▃▂▂▃▂▃▂▂▂▂▂▂▃▂▂▃▃▃▃▃▂▂▁▂▂▂▁▂ ▄
  2.34 ms         Histogram: frequency by time        2.54 ms <

 Memory estimate: 717.47 KiB, allocs estimate: 10572.

julia> lorenz = (du,u,p,t) -> begin
        du[1] = 10.0(u[2]-u[1])
        du[2] = u[1]*(28.0-u[3]) - u[2]
        du[3] = u[1]*u[2] - (8/3)*u[3]
       end
#3 (generic function with 1 method)

julia> prob = ODEProblem(lorenz,u0,tspan)
ODEProblem with uType Vector{Float64} and tType Float64. In-place: true
timespan: (0.0, 100.0)
u0: 3-element Vector{Float64}:
 1.0
 0.0
 0.0

julia> tinf = @snoopi_deep solve(prob,alg)
InferenceTimingNode: 1.197684/2.313140 on Core.Compiler.Timings.ROOT() with 3 direct children

We'll have to look which type instabilities that appear here are causing the performance regression before adding the option to these packages, but the possible latency improvements here are enticing.

@YingboMa
Copy link
Contributor

YingboMa commented Dec 8, 2021

What's the implication of @max_methods 1? For instance, will Union{Int,Vector{Int}} be specialized? What about the iteration protocol?

@KristofferC
Copy link
Member

This is what the 3 represents:

julia> f(x::Int) = 1;

julia> g(x) = f(x)
g (generic function with 1 method)

julia> code_warntype(g, Tuple{Any})
MethodInstance for g(::Any)
  from g(x) in Main at REPL[2]:1
Arguments
  #self#::Core.Const(g)
  x::Any
Body::Int64
1%1 = Main.f(x)::Core.Const(1)
└──      return %1


julia> f(x::Float64) = 2.0;

julia> code_warntype(g, Tuple{Any})
MethodInstance for g(::Any)
  from g(x) in Main at REPL[2]:1
Arguments
  #self#::Core.Const(g)
  x::Any
Body::Union{Float64, Int64}
1%1 = Main.f(x)::Union{Float64, Int64}
└──      return %1


julia> f(x::Float32) = 3.0f0;

julia> code_warntype(g, Tuple{Any})
MethodInstance for g(::Any)
  from g(x) in Main at REPL[2]:1
Arguments
  #self#::Core.Const(g)
  x::Any
Body::Union{Float32, Float64, Int64}
1%1 = Main.f(x)::Union{Float32, Float64, Int64}
└──      return %1


julia> f(x::String) = "4"
f (generic function with 4 methods)

julia> code_warntype(g, Tuple{Any})
MethodInstance for g(::Any)
  from g(x) in Main at REPL[2]:1
Arguments
  #self#::Core.Const(g)
  x::Any
Body::Any
1%1 = Main.f(x)::Any
└──      return %1

@chriselrod
Copy link
Contributor Author

What's the implication of @max_methods 1? For instance, will Union{Int,Vector{Int}} be specialized? What about the iteration protocol?

It will not cause problems for/is not applicable to the iteration protocol. This does not rely on method matching:

valuestate = iterate(iter, state)
valuestate === nothing || break
value, state = valuestate

Union splits should (likewise) generally be fine.

Copy link
Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but it would be ideal if someone who spends more time hacking on inference than me took a look.

Overall, I agree there are very good reasons to target this parameter, and I suspect many packages should set it.

@vchuravy vchuravy requested a review from aviatesk December 10, 2021 03:35
@aviatesk aviatesk added merge me PR is reviewed. Merge when all tests are passing latency Latency labels Dec 16, 2021
@aviatesk aviatesk merged commit a957953 into JuliaLang:master Dec 17, 2021
@chriselrod chriselrod deleted the permodulemaxmethods branch December 17, 2021 11:59
@DilumAluthge DilumAluthge removed the merge me PR is reviewed. Merge when all tests are passing label Dec 17, 2021
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
@KristofferC
Copy link
Member

Shouldn't this have any tests?

@chriselrod
Copy link
Contributor Author

What should tests look like?
If you want an example use
https://github.com/JuliaPlots/Plots.jl/blob/ce961acec433d34be00470e2c07500e9be894080/src/Plots.jl#L6-L8

I should probably add it to LoopVectorization, too.

@KristofferC
Copy link
Member

KristofferC commented Jan 16, 2023

What should tests look like?

Like using the functionality @max_methods and checking that it had the intended effect, like checking that inference stops after it reaches the max methods instead of the default 3 etc? That the macro errors when applied to a wrong type of expression, that the module version works on all methods in a module etc etc. I don't really get the question..

@KristofferC
Copy link
Member

Okay, this PR did less than I thought and I didn't find the tests of the original max_methods PR due to #48316

@chriselrod
Copy link
Contributor Author

I think those tests are for #44426
This PR still needs tests.

@KristofferC
Copy link
Member

Yes, but I didn't find any tests at all :p

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
latency Latency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants