Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve EC methods (focusing on the compressible Euler equations) #643

Merged
merged 43 commits into from
Jun 30, 2021

Conversation

ranocha
Copy link
Member

@ranocha ranocha commented Jun 14, 2021

I wrote another blog post explaining the reasons for these changes. tl;dr: Trixi was hit quite a bit by the difference between LLVM and GCC. I could fix that by introducing @muladd. With some additional performance optimizations, I could improve the performance of the RHS computations at the initial datum

  • by ca. 33% for examples/2d/elixir_euler_ec.jl
  • by ca. 38% for examples/2d/elixir_euler_ec_curved.jl

Results from Rocinante:

1 Thread: Nice performance improvements overall, in particular for Euler EC; no significant runtime regressions

Job Properties

  • Time of benchmarks:
    • Target: 14 Jun 2021 - 19:19
    • Baseline: 14 Jun 2021 - 19:36
  • Package commits:
    • Target: 028976
    • Baseline: 658e6e
  • Julia commits:
    • Target: 6aaede
    • Baseline: 6aaede
  • Julia command flags:
    • Target: -Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1
    • Baseline: -Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["2d", "elixir_2d_euler_vortex_structured.jl", "p3_rhs!"] 0.93 (5%) ✅ 1.05 (1%) ❌
["2d", "elixir_2d_euler_vortex_structured.jl", "p7_rhs!"] 0.90 (5%) ✅ 1.04 (1%) ❌
["2d", "elixir_2d_euler_vortex_tree.jl", "p3_rhs!"] 0.92 (5%) ✅ 1.05 (1%) ❌
["2d", "elixir_2d_euler_vortex_tree.jl", "p7_rhs!"] 0.90 (5%) ✅ 1.04 (1%) ❌
["2d", "elixir_2d_euler_vortex_unstructured.jl", "p3_rhs!"] 0.94 (5%) ✅ 1.04 (1%) ❌
["2d", "elixir_2d_euler_vortex_unstructured.jl", "p7_rhs!"] 0.89 (5%) ✅ 1.03 (1%) ❌
["2d", "elixir_advection_amr_nonperiodic.jl", "p3_analysis"] 0.94 (5%) ✅ 1.00 (1%)
["2d", "elixir_euler_ec.jl", "p3_rhs!"] 0.60 (5%) ✅ 1.05 (1%) ❌
["2d", "elixir_euler_ec.jl", "p7_rhs!"] 0.58 (5%) ✅ 1.04 (1%) ❌
["2d", "elixir_euler_ec_curved.jl", "p3_rhs!"] 0.53 (5%) ✅ 1.05 (1%) ❌
["2d", "elixir_euler_ec_curved.jl", "p7_rhs!"] 0.52 (5%) ✅ 1.04 (1%) ❌
["2d", "elixir_euler_nonperiodic_curved.jl", "p3_rhs!"] 0.95 (5%) 1.05 (1%) ❌
["2d", "elixir_euler_nonperiodic_curved.jl", "p7_rhs!"] 0.93 (5%) ✅ 1.04 (1%) ❌
["2d", "elixir_euler_unstructured_quad_wall_bc.jl", "p3_analysis"] 0.93 (5%) ✅ 1.00 (1%)
["2d", "elixir_euler_unstructured_quad_wall_bc.jl", "p3_rhs!"] 0.96 (5%) 1.04 (1%) ❌
["2d", "elixir_euler_unstructured_quad_wall_bc.jl", "p7_rhs!"] 0.90 (5%) ✅ 1.03 (1%) ❌
["2d", "elixir_euler_vortex_mortar.jl", "p3_rhs!"] 0.93 (5%) ✅ 1.05 (1%) ❌
["2d", "elixir_euler_vortex_mortar.jl", "p7_rhs!"] 0.91 (5%) ✅ 1.04 (1%) ❌
["2d", "elixir_euler_vortex_mortar_shockcapturing.jl", "p3_rhs!"] 0.86 (5%) ✅ 1.03 (1%) ❌
["2d", "elixir_euler_vortex_mortar_shockcapturing.jl", "p7_rhs!"] 0.80 (5%) ✅ 1.03 (1%) ❌
["3d", "elixir_euler_ec.jl", "p3_analysis"] 1.00 (5%) 1.03 (1%) ❌
["3d", "elixir_euler_ec.jl", "p3_rhs!"] 0.65 (5%) ✅ 1.04 (1%) ❌
["3d", "elixir_euler_ec.jl", "p7_analysis"] 1.00 (5%) 1.02 (1%) ❌
["3d", "elixir_euler_ec.jl", "p7_rhs!"] 0.64 (5%) ✅ 1.03 (1%) ❌
["3d", "elixir_euler_ec_curved.jl", "p3_rhs!"] 0.51 (5%) ✅ 1.05 (1%) ❌
["3d", "elixir_euler_ec_curved.jl", "p7_rhs!"] 0.54 (5%) ✅ 1.04 (1%) ❌
["3d", "elixir_euler_mortar.jl", "p3_analysis"] 0.98 (5%) 1.03 (1%) ❌
["3d", "elixir_euler_mortar.jl", "p3_rhs!"] 0.95 (5%) 1.04 (1%) ❌
["3d", "elixir_euler_mortar.jl", "p7_analysis"] 0.99 (5%) 1.02 (1%) ❌
["3d", "elixir_euler_mortar.jl", "p7_rhs!"] 0.97 (5%) 1.03 (1%) ❌
["3d", "elixir_euler_nonperiodic_curved.jl", "p3_rhs!"] 0.96 (5%) 1.05 (1%) ❌
["3d", "elixir_euler_nonperiodic_curved.jl", "p7_rhs!"] 0.94 (5%) ✅ 1.04 (1%) ❌
["3d", "elixir_euler_shockcapturing.jl", "p3_analysis"] 1.00 (5%) 1.02 (1%) ❌
["3d", "elixir_euler_shockcapturing.jl", "p3_rhs!"] 0.66 (5%) ✅ 1.03 (1%) ❌
["3d", "elixir_euler_shockcapturing.jl", "p7_analysis"] 1.00 (5%) 1.02 (1%) ❌
["3d", "elixir_euler_shockcapturing.jl", "p7_rhs!"] 0.64 (5%) ✅ 1.03 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["2d", "elixir_2d_euler_vortex_structured.jl"]
  • ["2d", "elixir_2d_euler_vortex_tree.jl"]
  • ["2d", "elixir_2d_euler_vortex_unstructured.jl"]
  • ["2d", "elixir_advection_amr_nonperiodic.jl"]
  • ["2d", "elixir_advection_extended.jl"]
  • ["2d", "elixir_advection_extended_curved.jl"]
  • ["2d", "elixir_advection_nonperiodic_curved.jl"]
  • ["2d", "elixir_euler_ec.jl"]
  • ["2d", "elixir_euler_ec_curved.jl"]
  • ["2d", "elixir_euler_nonperiodic_curved.jl"]
  • ["2d", "elixir_euler_unstructured_quad_wall_bc.jl"]
  • ["2d", "elixir_euler_vortex_mortar.jl"]
  • ["2d", "elixir_euler_vortex_mortar_shockcapturing.jl"]
  • ["3d", "elixir_advection_extended.jl"]
  • ["3d", "elixir_advection_nonperiodic_curved.jl"]
  • ["3d", "elixir_euler_ec.jl"]
  • ["3d", "elixir_euler_ec_curved.jl"]
  • ["3d", "elixir_euler_mortar.jl"]
  • ["3d", "elixir_euler_nonperiodic_curved.jl"]
  • ["3d", "elixir_euler_shockcapturing.jl"]
  • ["latency"]

Julia versioninfo

Target

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2199 MHz  142957996 s       7228 s     609093 s  6880605121 s          0 s
       
  Memory: 251.6334342956543 GB (21132.83203125 MB free)
  Uptime: 5.488206e6 sec
  Load Avg:  1.0  0.98  0.69
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

Baseline

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2174 MHz  142967646 s       7228 s     609257 s  6881843720 s          0 s
       
  Memory: 251.6334342956543 GB (21093.7578125 MB free)
  Uptime: 5.489182e6 sec
  Load Avg:  1.0  1.0  0.92
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

2 Threads: Nice performance improvements overall, in particular for Euler EC; no significant runtime regressions

Job Properties

  • Time of benchmarks:
    • Target: 14 Jun 2021 - 19:53
    • Baseline: 14 Jun 2021 - 20:11
  • Package commits:
    • Target: 028976
    • Baseline: 658e6e
  • Julia commits:
    • Target: 6aaede
    • Baseline: 6aaede
  • Julia command flags:
    • Target: -Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=2
    • Baseline: -Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=2
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["2d", "elixir_2d_euler_vortex_structured.jl", "p3_rhs!"] 0.92 (5%) ✅ 1.01 (1%) ❌
["2d", "elixir_2d_euler_vortex_tree.jl", "p3_rhs!"] 0.95 (5%) ✅ 1.01 (1%) ❌
["2d", "elixir_2d_euler_vortex_tree.jl", "p7_rhs!"] 0.85 (5%) ✅ 1.01 (1%)
["2d", "elixir_2d_euler_vortex_unstructured.jl", "p3_rhs!"] 0.98 (5%) 1.01 (1%) ❌
["2d", "elixir_2d_euler_vortex_unstructured.jl", "p7_rhs!"] 0.89 (5%) ✅ 1.01 (1%)
["2d", "elixir_advection_amr_nonperiodic.jl", "p3_analysis"] 0.94 (5%) ✅ 1.00 (1%)
["2d", "elixir_euler_ec.jl", "p3_rhs!"] 0.61 (5%) ✅ 1.01 (1%) ❌
["2d", "elixir_euler_ec.jl", "p7_rhs!"] 0.58 (5%) ✅ 1.01 (1%)
["2d", "elixir_euler_ec_curved.jl", "p3_rhs!"] 0.57 (5%) ✅ 1.01 (1%) ❌
["2d", "elixir_euler_ec_curved.jl", "p7_rhs!"] 0.55 (5%) ✅ 1.01 (1%)
["2d", "elixir_euler_nonperiodic_curved.jl", "p3_rhs!"] 0.95 (5%) 1.02 (1%) ❌
["2d", "elixir_euler_nonperiodic_curved.jl", "p7_rhs!"] 0.94 (5%) ✅ 1.01 (1%) ❌
["2d", "elixir_euler_vortex_mortar.jl", "p3_rhs!"] 0.94 (5%) ✅ 1.01 (1%)
["2d", "elixir_euler_vortex_mortar.jl", "p7_rhs!"] 0.88 (5%) ✅ 1.01 (1%)
["2d", "elixir_euler_vortex_mortar_shockcapturing.jl", "p7_rhs!"] 0.82 (5%) ✅ 1.00 (1%)
["3d", "elixir_euler_ec.jl", "p3_analysis"] 1.01 (5%) 1.03 (1%) ❌
["3d", "elixir_euler_ec.jl", "p3_rhs!"] 0.65 (5%) ✅ 1.01 (1%) ❌
["3d", "elixir_euler_ec.jl", "p7_analysis"] 1.00 (5%) 1.02 (1%) ❌
["3d", "elixir_euler_ec.jl", "p7_rhs!"] 0.61 (5%) ✅ 1.01 (1%)
["3d", "elixir_euler_ec_curved.jl", "p3_rhs!"] 0.57 (5%) ✅ 1.01 (1%) ❌
["3d", "elixir_euler_ec_curved.jl", "p7_rhs!"] 0.59 (5%) ✅ 1.01 (1%)
["3d", "elixir_euler_mortar.jl", "p3_analysis"] 0.99 (5%) 1.03 (1%) ❌
["3d", "elixir_euler_mortar.jl", "p7_analysis"] 0.98 (5%) 1.02 (1%) ❌
["3d", "elixir_euler_nonperiodic_curved.jl", "p3_rhs!"] 0.95 (5%) 1.02 (1%) ❌
["3d", "elixir_euler_nonperiodic_curved.jl", "p7_rhs!"] 0.94 (5%) ✅ 1.01 (1%) ❌
["3d", "elixir_euler_shockcapturing.jl", "p3_analysis"] 1.01 (5%) 1.02 (1%) ❌
["3d", "elixir_euler_shockcapturing.jl", "p3_rhs!"] 0.66 (5%) ✅ 1.01 (1%)
["3d", "elixir_euler_shockcapturing.jl", "p7_analysis"] 1.00 (5%) 1.02 (1%) ❌
["3d", "elixir_euler_shockcapturing.jl", "p7_rhs!"] 0.61 (5%) ✅ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["2d", "elixir_2d_euler_vortex_structured.jl"]
  • ["2d", "elixir_2d_euler_vortex_tree.jl"]
  • ["2d", "elixir_2d_euler_vortex_unstructured.jl"]
  • ["2d", "elixir_advection_amr_nonperiodic.jl"]
  • ["2d", "elixir_advection_extended.jl"]
  • ["2d", "elixir_advection_extended_curved.jl"]
  • ["2d", "elixir_advection_nonperiodic_curved.jl"]
  • ["2d", "elixir_euler_ec.jl"]
  • ["2d", "elixir_euler_ec_curved.jl"]
  • ["2d", "elixir_euler_nonperiodic_curved.jl"]
  • ["2d", "elixir_euler_unstructured_quad_wall_bc.jl"]
  • ["2d", "elixir_euler_vortex_mortar.jl"]
  • ["2d", "elixir_euler_vortex_mortar_shockcapturing.jl"]
  • ["3d", "elixir_advection_extended.jl"]
  • ["3d", "elixir_advection_nonperiodic_curved.jl"]
  • ["3d", "elixir_euler_ec.jl"]
  • ["3d", "elixir_euler_ec_curved.jl"]
  • ["3d", "elixir_euler_mortar.jl"]
  • ["3d", "elixir_euler_nonperiodic_curved.jl"]
  • ["3d", "elixir_euler_shockcapturing.jl"]
  • ["latency"]

Julia versioninfo

Target

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2174 MHz  142979653 s       7228 s     609434 s  6883177485 s          0 s
       
  Memory: 251.6334342956543 GB (20722.4140625 MB free)
  Uptime: 5.490233e6 sec
  Load Avg:  1.43  1.31  1.13
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

Baseline

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2176 MHz  142991726 s       7228 s     609618 s  6884515169 s          0 s
       
  Memory: 251.6334342956543 GB (20599.0 MB free)
  Uptime: 5.491288e6 sec
  Load Avg:  1.45  1.33  1.19
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

@ranocha ranocha force-pushed the hr/blog_ec_performance branch from dec9ae8 to 61ca617 Compare June 14, 2021 09:50
@ranocha ranocha force-pushed the hr/blog_ec_performance branch from 61ca617 to 12be336 Compare June 14, 2021 10:51
@jlchan
Copy link
Contributor

jlchan commented Jun 14, 2021

That blog post is very informative - thanks for sharing! The @muladd macro is new to me.

Two minor typos in the blog post: "Divisions are more expansive on modern hardhare than multiplications."

@ranocha
Copy link
Member Author

ranocha commented Jun 14, 2021

That blog post is very informative - thanks for sharing! The @muladd macro is new to me.

Two minor typos in the blog post: "Divisions are more expansive on modern hardhare than multiplications."

Thanks 👍

@ranocha ranocha requested a review from sloede June 14, 2021 14:43
@ranocha ranocha marked this pull request as ready for review June 14, 2021 14:43
@codecov
Copy link

codecov bot commented Jun 14, 2021

Codecov Report

Merging #643 (813561b) into main (48fb4b9) will decrease coverage by 0.05%.
The diff coverage is 96.37%.

❗ Current head 813561b differs from pull request most recent head f638268. Consider uploading reports for the commit f638268 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #643      +/-   ##
==========================================
- Coverage   93.69%   93.63%   -0.06%     
==========================================
  Files         171      171              
  Lines       16600    16595       -5     
==========================================
- Hits        15553    15539      -14     
- Misses       1047     1056       +9     
Flag Coverage Δ
unittests 93.63% <96.37%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
examples/structured_2d_dgsem/elixir_euler_ec.jl 100.00% <ø> (ø)
examples/structured_3d_dgsem/elixir_euler_ec.jl 100.00% <ø> (ø)
src/Trixi.jl 83.33% <ø> (ø)
src/auxiliary/auxiliary.jl 90.00% <ø> (ø)
src/auxiliary/containers.jl 93.89% <ø> (ø)
src/auxiliary/mpi.jl 75.75% <ø> (ø)
src/auxiliary/precompile.jl 0.00% <ø> (ø)
src/auxiliary/special_elixirs.jl 94.31% <ø> (ø)
src/basic_types.jl 100.00% <ø> (ø)
src/callbacks_stage/positivity_zhang_shu.jl 87.50% <ø> (ø)
... and 118 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48fb4b9...f638268. Read the comment docs.

Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job in doing all these performance optimizations, including the two blog posts! I have some questions and remarks, some of which might be challenging ;-) In this case, it we'll have to see how far we get here or if we should discuss it in person...

@ranocha
Copy link
Member Author

ranocha commented Jun 18, 2021

Once thing I forgot in my review: Instead of changing each file individually, would it be possible to also just wrap more than a single file in a begin ... end block? Or replace include(...) by includefast(...) that does it automatically for you? This would make it less verbose and easier to change in a single place should the need arise to do so in the future.

I thought you would prefer the more explicit solution having a comment in each file. Nevertheless, here you are: include_optimized

This version makes it considerably more difficult to work on Trixi due to a bug in Revise (timholy/Revise.jl#634). Hence, I strongly prefer the more verbose option to annotate each file until that bug is fixed. Is that okay for you, @sloede?

@ranocha ranocha added performance We are greedy triage To be decided at the next Trixi meeting labels Jun 23, 2021
@ranocha
Copy link
Member Author

ranocha commented Jun 24, 2021

Once thing I forgot in my review: Instead of changing each file individually, would it be possible to also just wrap more than a single file in a begin ... end block? Or replace include(...) by includefast(...) that does it automatically for you? This would make it less verbose and easier to change in a single place should the need arise to do so in the future.

I thought you would prefer the more explicit solution having a comment in each file. Nevertheless, here you are: include_optimized

This version makes it considerably more difficult to work on Trixi due to a bug in Revise (timholy/Revise.jl#634). Hence, I strongly prefer the more verbose option to annotate each file until that bug is fixed. Is that okay for you, @sloede?

@sloede: Can we also put this on the agenda to be able to finish this PR next week?

@sloede
Copy link
Member

sloede commented Jun 24, 2021

Once thing I forgot in my review: Instead of changing each file individually, would it be possible to also just wrap more than a single file in a begin ... end block? Or replace include(...) by includefast(...) that does it automatically for you? This would make it less verbose and easier to change in a single place should the need arise to do so in the future.

I thought you would prefer the more explicit solution having a comment in each file. Nevertheless, here you are: include_optimized

This version makes it considerably more difficult to work on Trixi due to a bug in Revise (timholy/Revise.jl#634). Hence, I strongly prefer the more verbose option to annotate each file until that bug is fixed. Is that okay for you, @sloede?

@sloede: Can we also put this on the agenda to be able to finish this PR next week?

No need. I propose to proceed with your original version but to open an issue that tracks the related Revise issue. I really like the include_optimized trick you created, and I'd strongly prefer this to having the explicit code everywhere once Revise is fixed.

@ranocha
Copy link
Member Author

ranocha commented Jun 28, 2021

New benchmark from Rocinante

1 Thread: Nice runtime improvements overall (up to ca. 2x), only minor memory regressions

## Job Properties * Time of benchmarks: - Target: 28 Jun 2021 - 13:39 - Baseline: 28 Jun 2021 - 13:59 * Package commits: - Target: cae2e2 - Baseline: a8d5f9 * Julia commits: - Target: 6aaede - Baseline: 6aaede * Julia command flags: - Target: `-Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1` - Baseline: `-Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1` * Environment variables: - Target: None - Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["benchmark/elixir_2d_euler_vortex_structured.jl", "p3_rhs!"] 0.93 (5%) ✅ 1.05 (1%) ❌
["benchmark/elixir_2d_euler_vortex_structured.jl", "p7_rhs!"] 0.89 (5%) ✅ 1.04 (1%) ❌
["benchmark/elixir_2d_euler_vortex_tree.jl", "p3_rhs!"] 0.93 (5%) ✅ 1.05 (1%) ❌
["benchmark/elixir_2d_euler_vortex_tree.jl", "p7_rhs!"] 0.91 (5%) ✅ 1.04 (1%) ❌
["benchmark/elixir_2d_euler_vortex_unstructured.jl", "p3_rhs!"] 0.94 (5%) ✅ 1.04 (1%) ❌
["benchmark/elixir_2d_euler_vortex_unstructured.jl", "p7_rhs!"] 0.90 (5%) ✅ 1.03 (1%) ❌
["structured_2d_dgsem/elixir_euler_ec.jl", "p3_rhs!"] 0.53 (5%) ✅ 1.05 (1%) ❌
["structured_2d_dgsem/elixir_euler_ec.jl", "p7_rhs!"] 0.52 (5%) ✅ 1.04 (1%) ❌
["structured_2d_dgsem/elixir_euler_source_terms_nonperiodic.jl", "p3_rhs!"] 0.95 (5%) 1.05 (1%) ❌
["structured_2d_dgsem/elixir_euler_source_terms_nonperiodic.jl", "p7_rhs!"] 0.92 (5%) ✅ 1.04 (1%) ❌
["structured_2d_dgsem/elixir_mhd_ec.jl", "p3_rhs!"] 0.90 (5%) ✅ 1.00 (1%)
["structured_2d_dgsem/elixir_mhd_ec.jl", "p7_rhs!"] 0.94 (5%) ✅ 1.00 (1%)
["structured_3d_dgsem/elixir_euler_ec.jl", "p3_rhs!"] 0.51 (5%) ✅ 1.05 (1%) ❌
["structured_3d_dgsem/elixir_euler_ec.jl", "p7_rhs!"] 0.53 (5%) ✅ 1.04 (1%) ❌
["structured_3d_dgsem/elixir_euler_source_terms_nonperiodic.jl", "p3_rhs!"] 0.96 (5%) 1.05 (1%) ❌
["structured_3d_dgsem/elixir_euler_source_terms_nonperiodic.jl", "p7_rhs!"] 0.93 (5%) ✅ 1.04 (1%) ❌
["structured_3d_dgsem/elixir_mhd_ec.jl", "p3_rhs!"] 0.91 (5%) ✅ 1.00 (1%)
["structured_3d_dgsem/elixir_mhd_ec.jl", "p7_rhs!"] 0.89 (5%) ✅ 1.00 (1%)
["tree_2d_dgsem/elixir_advection_amr_nonperiodic.jl", "p3_analysis"] 0.95 (5%) ✅ 1.00 (1%)
["tree_2d_dgsem/elixir_euler_ec.jl", "p3_rhs!"] 0.61 (5%) ✅ 1.05 (1%) ❌
["tree_2d_dgsem/elixir_euler_ec.jl", "p7_rhs!"] 0.58 (5%) ✅ 1.04 (1%) ❌
["tree_2d_dgsem/elixir_euler_vortex_mortar.jl", "p3_rhs!"] 0.92 (5%) ✅ 1.05 (1%) ❌
["tree_2d_dgsem/elixir_euler_vortex_mortar.jl", "p7_rhs!"] 0.90 (5%) ✅ 1.04 (1%) ❌
["tree_2d_dgsem/elixir_euler_vortex_mortar_shockcapturing.jl", "p3_rhs!"] 0.86 (5%) ✅ 1.03 (1%) ❌
["tree_2d_dgsem/elixir_euler_vortex_mortar_shockcapturing.jl", "p7_rhs!"] 0.80 (5%) ✅ 1.03 (1%) ❌
["tree_2d_dgsem/elixir_mhd_ec.jl", "p3_rhs!"] 0.78 (5%) ✅ 1.00 (1%)
["tree_2d_dgsem/elixir_mhd_ec.jl", "p7_rhs!"] 0.74 (5%) ✅ 1.00 (1%)
["tree_3d_dgsem/elixir_euler_ec.jl", "p3_analysis"] 1.01 (5%) 1.03 (1%) ❌
["tree_3d_dgsem/elixir_euler_ec.jl", "p3_rhs!"] 0.66 (5%) ✅ 1.04 (1%) ❌
["tree_3d_dgsem/elixir_euler_ec.jl", "p7_analysis"] 1.00 (5%) 1.02 (1%) ❌
["tree_3d_dgsem/elixir_euler_ec.jl", "p7_rhs!"] 0.64 (5%) ✅ 1.03 (1%) ❌
["tree_3d_dgsem/elixir_euler_mortar.jl", "p3_analysis"] 0.98 (5%) 1.03 (1%) ❌
["tree_3d_dgsem/elixir_euler_mortar.jl", "p3_rhs!"] 0.95 (5%) 1.04 (1%) ❌
["tree_3d_dgsem/elixir_euler_mortar.jl", "p7_analysis"] 0.98 (5%) 1.02 (1%) ❌
["tree_3d_dgsem/elixir_euler_mortar.jl", "p7_rhs!"] 0.97 (5%) 1.03 (1%) ❌
["tree_3d_dgsem/elixir_euler_shockcapturing.jl", "p3_analysis"] 1.00 (5%) 1.02 (1%) ❌
["tree_3d_dgsem/elixir_euler_shockcapturing.jl", "p3_rhs!"] 0.67 (5%) ✅ 1.03 (1%) ❌
["tree_3d_dgsem/elixir_euler_shockcapturing.jl", "p7_analysis"] 0.99 (5%) 1.02 (1%) ❌
["tree_3d_dgsem/elixir_euler_shockcapturing.jl", "p7_rhs!"] 0.64 (5%) ✅ 1.03 (1%) ❌
["tree_3d_dgsem/elixir_mhd_ec.jl", "p3_rhs!"] 0.80 (5%) ✅ 1.00 (1%)
["tree_3d_dgsem/elixir_mhd_ec.jl", "p7_rhs!"] 0.75 (5%) ✅ 1.00 (1%)
["unstructured_2d_dgsem/elixir_euler_wall_bc.jl", "p3_analysis"] 0.93 (5%) ✅ 1.00 (1%)
["unstructured_2d_dgsem/elixir_euler_wall_bc.jl", "p3_rhs!"] 0.95 (5%) ✅ 1.04 (1%) ❌
["unstructured_2d_dgsem/elixir_euler_wall_bc.jl", "p7_rhs!"] 0.90 (5%) ✅ 1.03 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["benchmark/elixir_2d_euler_vortex_structured.jl"]
  • ["benchmark/elixir_2d_euler_vortex_tree.jl"]
  • ["benchmark/elixir_2d_euler_vortex_unstructured.jl"]
  • ["latency"]
  • ["p4est_2d_dgsem/elixir_advection_extended.jl"]
  • ["p4est_3d_dgsem/elixir_advection_basic.jl"]
  • ["structured_2d_dgsem/elixir_advection_extended.jl"]
  • ["structured_2d_dgsem/elixir_advection_nonperiodic.jl"]
  • ["structured_2d_dgsem/elixir_euler_ec.jl"]
  • ["structured_2d_dgsem/elixir_euler_source_terms_nonperiodic.jl"]
  • ["structured_2d_dgsem/elixir_mhd_ec.jl"]
  • ["structured_3d_dgsem/elixir_advection_nonperiodic.jl"]
  • ["structured_3d_dgsem/elixir_euler_ec.jl"]
  • ["structured_3d_dgsem/elixir_euler_source_terms_nonperiodic.jl"]
  • ["structured_3d_dgsem/elixir_mhd_ec.jl"]
  • ["tree_2d_dgsem/elixir_advection_amr_nonperiodic.jl"]
  • ["tree_2d_dgsem/elixir_advection_extended.jl"]
  • ["tree_2d_dgsem/elixir_euler_ec.jl"]
  • ["tree_2d_dgsem/elixir_euler_vortex_mortar.jl"]
  • ["tree_2d_dgsem/elixir_euler_vortex_mortar_shockcapturing.jl"]
  • ["tree_2d_dgsem/elixir_mhd_ec.jl"]
  • ["tree_3d_dgsem/elixir_advection_extended.jl"]
  • ["tree_3d_dgsem/elixir_euler_ec.jl"]
  • ["tree_3d_dgsem/elixir_euler_mortar.jl"]
  • ["tree_3d_dgsem/elixir_euler_shockcapturing.jl"]
  • ["tree_3d_dgsem/elixir_mhd_ec.jl"]
  • ["unstructured_2d_dgsem/elixir_euler_wall_bc.jl"]

Julia versioninfo

Target

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  4030 MHz  143325542 s       9068 s     641775 s  8402251998 s          0 s
       
  Memory: 251.6334342956543 GB (1520.875 MB free)
  Uptime: 6.67736e6 sec
  Load Avg:  2.0  2.05  2.16
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

Baseline

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2176 MHz  143343277 s       9068 s     642286 s  8403796597 s          0 s
       
  Memory: 251.6334342956543 GB (9770.75 MB free)
  Uptime: 6.678581e6 sec
  Load Avg:  1.0  1.13  1.54
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

@ranocha ranocha removed the triage To be decided at the next Trixi meeting label Jun 29, 2021
@ranocha
Copy link
Member Author

ranocha commented Jun 29, 2021

@sloede: This PR should be ready for the final review

Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's essentially only two questions: Greek letters vs. consistency and some additional comments. Otherwise it LGTM!

@ranocha ranocha requested a review from sloede June 30, 2021 04:12
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks and kudos for the great performance improvements! And thanks for the patience!

@ranocha
Copy link
Member Author

ranocha commented Jun 30, 2021

Well, thank you for your patience with me and your helpful review! This really improved the quality of this PR 👍

@ranocha ranocha merged commit 83bd625 into trixi-framework:main Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance We are greedy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants