Improve EC methods (focusing on the compressible Euler equations) #643

ranocha · 2021-06-14T06:06:16Z

I wrote another blog post explaining the reasons for these changes. tl;dr: Trixi was hit quite a bit by the difference between LLVM and GCC. I could fix that by introducing @muladd. With some additional performance optimizations, I could improve the performance of the RHS computations at the initial datum

by ca. 33% for examples/2d/elixir_euler_ec.jl
by ca. 38% for examples/2d/elixir_euler_ec_curved.jl

Results from Rocinante:

1 Thread: Nice performance improvements overall, in particular for Euler EC; no significant runtime regressions

Job Properties

Time of benchmarks:
- Target: 14 Jun 2021 - 19:19
- Baseline: 14 Jun 2021 - 19:36
Package commits:
- Target: 028976
- Baseline: 658e6e
Julia commits:
- Target: 6aaede
- Baseline: 6aaede
Julia command flags:
- Target: -Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1
- Baseline: -Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1
Environment variables:
- Target: None
- Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID	time ratio	memory ratio
`["2d", "elixir_2d_euler_vortex_structured.jl", "p3_rhs!"]`	0.93 (5%) ✅	1.05 (1%) ❌
`["2d", "elixir_2d_euler_vortex_structured.jl", "p7_rhs!"]`	0.90 (5%) ✅	1.04 (1%) ❌
`["2d", "elixir_2d_euler_vortex_tree.jl", "p3_rhs!"]`	0.92 (5%) ✅	1.05 (1%) ❌
`["2d", "elixir_2d_euler_vortex_tree.jl", "p7_rhs!"]`	0.90 (5%) ✅	1.04 (1%) ❌
`["2d", "elixir_2d_euler_vortex_unstructured.jl", "p3_rhs!"]`	0.94 (5%) ✅	1.04 (1%) ❌
`["2d", "elixir_2d_euler_vortex_unstructured.jl", "p7_rhs!"]`	0.89 (5%) ✅	1.03 (1%) ❌
`["2d", "elixir_advection_amr_nonperiodic.jl", "p3_analysis"]`	0.94 (5%) ✅	1.00 (1%)
`["2d", "elixir_euler_ec.jl", "p3_rhs!"]`	0.60 (5%) ✅	1.05 (1%) ❌
`["2d", "elixir_euler_ec.jl", "p7_rhs!"]`	0.58 (5%) ✅	1.04 (1%) ❌
`["2d", "elixir_euler_ec_curved.jl", "p3_rhs!"]`	0.53 (5%) ✅	1.05 (1%) ❌
`["2d", "elixir_euler_ec_curved.jl", "p7_rhs!"]`	0.52 (5%) ✅	1.04 (1%) ❌
`["2d", "elixir_euler_nonperiodic_curved.jl", "p3_rhs!"]`	0.95 (5%)	1.05 (1%) ❌
`["2d", "elixir_euler_nonperiodic_curved.jl", "p7_rhs!"]`	0.93 (5%) ✅	1.04 (1%) ❌
`["2d", "elixir_euler_unstructured_quad_wall_bc.jl", "p3_analysis"]`	0.93 (5%) ✅	1.00 (1%)
`["2d", "elixir_euler_unstructured_quad_wall_bc.jl", "p3_rhs!"]`	0.96 (5%)	1.04 (1%) ❌
`["2d", "elixir_euler_unstructured_quad_wall_bc.jl", "p7_rhs!"]`	0.90 (5%) ✅	1.03 (1%) ❌
`["2d", "elixir_euler_vortex_mortar.jl", "p3_rhs!"]`	0.93 (5%) ✅	1.05 (1%) ❌
`["2d", "elixir_euler_vortex_mortar.jl", "p7_rhs!"]`	0.91 (5%) ✅	1.04 (1%) ❌
`["2d", "elixir_euler_vortex_mortar_shockcapturing.jl", "p3_rhs!"]`	0.86 (5%) ✅	1.03 (1%) ❌
`["2d", "elixir_euler_vortex_mortar_shockcapturing.jl", "p7_rhs!"]`	0.80 (5%) ✅	1.03 (1%) ❌
`["3d", "elixir_euler_ec.jl", "p3_analysis"]`	1.00 (5%)	1.03 (1%) ❌
`["3d", "elixir_euler_ec.jl", "p3_rhs!"]`	0.65 (5%) ✅	1.04 (1%) ❌
`["3d", "elixir_euler_ec.jl", "p7_analysis"]`	1.00 (5%)	1.02 (1%) ❌
`["3d", "elixir_euler_ec.jl", "p7_rhs!"]`	0.64 (5%) ✅	1.03 (1%) ❌
`["3d", "elixir_euler_ec_curved.jl", "p3_rhs!"]`	0.51 (5%) ✅	1.05 (1%) ❌
`["3d", "elixir_euler_ec_curved.jl", "p7_rhs!"]`	0.54 (5%) ✅	1.04 (1%) ❌
`["3d", "elixir_euler_mortar.jl", "p3_analysis"]`	0.98 (5%)	1.03 (1%) ❌
`["3d", "elixir_euler_mortar.jl", "p3_rhs!"]`	0.95 (5%)	1.04 (1%) ❌
`["3d", "elixir_euler_mortar.jl", "p7_analysis"]`	0.99 (5%)	1.02 (1%) ❌
`["3d", "elixir_euler_mortar.jl", "p7_rhs!"]`	0.97 (5%)	1.03 (1%) ❌
`["3d", "elixir_euler_nonperiodic_curved.jl", "p3_rhs!"]`	0.96 (5%)	1.05 (1%) ❌
`["3d", "elixir_euler_nonperiodic_curved.jl", "p7_rhs!"]`	0.94 (5%) ✅	1.04 (1%) ❌
`["3d", "elixir_euler_shockcapturing.jl", "p3_analysis"]`	1.00 (5%)	1.02 (1%) ❌
`["3d", "elixir_euler_shockcapturing.jl", "p3_rhs!"]`	0.66 (5%) ✅	1.03 (1%) ❌
`["3d", "elixir_euler_shockcapturing.jl", "p7_analysis"]`	1.00 (5%)	1.02 (1%) ❌
`["3d", "elixir_euler_shockcapturing.jl", "p7_rhs!"]`	0.64 (5%) ✅	1.03 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

["2d", "elixir_2d_euler_vortex_structured.jl"]
["2d", "elixir_2d_euler_vortex_tree.jl"]
["2d", "elixir_2d_euler_vortex_unstructured.jl"]
["2d", "elixir_advection_amr_nonperiodic.jl"]
["2d", "elixir_advection_extended.jl"]
["2d", "elixir_advection_extended_curved.jl"]
["2d", "elixir_advection_nonperiodic_curved.jl"]
["2d", "elixir_euler_ec.jl"]
["2d", "elixir_euler_ec_curved.jl"]
["2d", "elixir_euler_nonperiodic_curved.jl"]
["2d", "elixir_euler_unstructured_quad_wall_bc.jl"]
["2d", "elixir_euler_vortex_mortar.jl"]
["2d", "elixir_euler_vortex_mortar_shockcapturing.jl"]
["3d", "elixir_advection_extended.jl"]
["3d", "elixir_advection_nonperiodic_curved.jl"]
["3d", "elixir_euler_ec.jl"]
["3d", "elixir_euler_ec_curved.jl"]
["3d", "elixir_euler_mortar.jl"]
["3d", "elixir_euler_nonperiodic_curved.jl"]
["3d", "elixir_euler_shockcapturing.jl"]
["latency"]

Julia versioninfo

Target

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2199 MHz  142957996 s       7228 s     609093 s  6880605121 s          0 s
       
  Memory: 251.6334342956543 GB (21132.83203125 MB free)
  Uptime: 5.488206e6 sec
  Load Avg:  1.0  0.98  0.69
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

Baseline

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2174 MHz  142967646 s       7228 s     609257 s  6881843720 s          0 s
       
  Memory: 251.6334342956543 GB (21093.7578125 MB free)
  Uptime: 5.489182e6 sec
  Load Avg:  1.0  1.0  0.92
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

2 Threads: Nice performance improvements overall, in particular for Euler EC; no significant runtime regressions

Job Properties

Time of benchmarks:
- Target: 14 Jun 2021 - 19:53
- Baseline: 14 Jun 2021 - 20:11
Package commits:
- Target: 028976
- Baseline: 658e6e
Julia commits:
- Target: 6aaede
- Baseline: 6aaede
Julia command flags:
- Target: -Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=2
- Baseline: -Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=2
Environment variables:
- Target: None
- Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID	time ratio	memory ratio
`["2d", "elixir_2d_euler_vortex_structured.jl", "p3_rhs!"]`	0.92 (5%) ✅	1.01 (1%) ❌
`["2d", "elixir_2d_euler_vortex_tree.jl", "p3_rhs!"]`	0.95 (5%) ✅	1.01 (1%) ❌
`["2d", "elixir_2d_euler_vortex_tree.jl", "p7_rhs!"]`	0.85 (5%) ✅	1.01 (1%)
`["2d", "elixir_2d_euler_vortex_unstructured.jl", "p3_rhs!"]`	0.98 (5%)	1.01 (1%) ❌
`["2d", "elixir_2d_euler_vortex_unstructured.jl", "p7_rhs!"]`	0.89 (5%) ✅	1.01 (1%)
`["2d", "elixir_advection_amr_nonperiodic.jl", "p3_analysis"]`	0.94 (5%) ✅	1.00 (1%)
`["2d", "elixir_euler_ec.jl", "p3_rhs!"]`	0.61 (5%) ✅	1.01 (1%) ❌
`["2d", "elixir_euler_ec.jl", "p7_rhs!"]`	0.58 (5%) ✅	1.01 (1%)
`["2d", "elixir_euler_ec_curved.jl", "p3_rhs!"]`	0.57 (5%) ✅	1.01 (1%) ❌
`["2d", "elixir_euler_ec_curved.jl", "p7_rhs!"]`	0.55 (5%) ✅	1.01 (1%)
`["2d", "elixir_euler_nonperiodic_curved.jl", "p3_rhs!"]`	0.95 (5%)	1.02 (1%) ❌
`["2d", "elixir_euler_nonperiodic_curved.jl", "p7_rhs!"]`	0.94 (5%) ✅	1.01 (1%) ❌
`["2d", "elixir_euler_vortex_mortar.jl", "p3_rhs!"]`	0.94 (5%) ✅	1.01 (1%)
`["2d", "elixir_euler_vortex_mortar.jl", "p7_rhs!"]`	0.88 (5%) ✅	1.01 (1%)
`["2d", "elixir_euler_vortex_mortar_shockcapturing.jl", "p7_rhs!"]`	0.82 (5%) ✅	1.00 (1%)
`["3d", "elixir_euler_ec.jl", "p3_analysis"]`	1.01 (5%)	1.03 (1%) ❌
`["3d", "elixir_euler_ec.jl", "p3_rhs!"]`	0.65 (5%) ✅	1.01 (1%) ❌
`["3d", "elixir_euler_ec.jl", "p7_analysis"]`	1.00 (5%)	1.02 (1%) ❌
`["3d", "elixir_euler_ec.jl", "p7_rhs!"]`	0.61 (5%) ✅	1.01 (1%)
`["3d", "elixir_euler_ec_curved.jl", "p3_rhs!"]`	0.57 (5%) ✅	1.01 (1%) ❌
`["3d", "elixir_euler_ec_curved.jl", "p7_rhs!"]`	0.59 (5%) ✅	1.01 (1%)
`["3d", "elixir_euler_mortar.jl", "p3_analysis"]`	0.99 (5%)	1.03 (1%) ❌
`["3d", "elixir_euler_mortar.jl", "p7_analysis"]`	0.98 (5%)	1.02 (1%) ❌
`["3d", "elixir_euler_nonperiodic_curved.jl", "p3_rhs!"]`	0.95 (5%)	1.02 (1%) ❌
`["3d", "elixir_euler_nonperiodic_curved.jl", "p7_rhs!"]`	0.94 (5%) ✅	1.01 (1%) ❌
`["3d", "elixir_euler_shockcapturing.jl", "p3_analysis"]`	1.01 (5%)	1.02 (1%) ❌
`["3d", "elixir_euler_shockcapturing.jl", "p3_rhs!"]`	0.66 (5%) ✅	1.01 (1%)
`["3d", "elixir_euler_shockcapturing.jl", "p7_analysis"]`	1.00 (5%)	1.02 (1%) ❌
`["3d", "elixir_euler_shockcapturing.jl", "p7_rhs!"]`	0.61 (5%) ✅	1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

["2d", "elixir_2d_euler_vortex_structured.jl"]
["2d", "elixir_2d_euler_vortex_tree.jl"]
["2d", "elixir_2d_euler_vortex_unstructured.jl"]
["2d", "elixir_advection_amr_nonperiodic.jl"]
["2d", "elixir_advection_extended.jl"]
["2d", "elixir_advection_extended_curved.jl"]
["2d", "elixir_advection_nonperiodic_curved.jl"]
["2d", "elixir_euler_ec.jl"]
["2d", "elixir_euler_ec_curved.jl"]
["2d", "elixir_euler_nonperiodic_curved.jl"]
["2d", "elixir_euler_unstructured_quad_wall_bc.jl"]
["2d", "elixir_euler_vortex_mortar.jl"]
["2d", "elixir_euler_vortex_mortar_shockcapturing.jl"]
["3d", "elixir_advection_extended.jl"]
["3d", "elixir_advection_nonperiodic_curved.jl"]
["3d", "elixir_euler_ec.jl"]
["3d", "elixir_euler_ec_curved.jl"]
["3d", "elixir_euler_mortar.jl"]
["3d", "elixir_euler_nonperiodic_curved.jl"]
["3d", "elixir_euler_shockcapturing.jl"]
["latency"]

Julia versioninfo

Target

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2174 MHz  142979653 s       7228 s     609434 s  6883177485 s          0 s
       
  Memory: 251.6334342956543 GB (20722.4140625 MB free)
  Uptime: 5.490233e6 sec
  Load Avg:  1.43  1.31  1.13
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

Baseline

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2176 MHz  142991726 s       7228 s     609618 s  6884515169 s          0 s
       
  Memory: 251.6334342956543 GB (20599.0 MB free)
  Uptime: 5.491288e6 sec
  Load Avg:  1.45  1.33  1.19
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

…out non-conservative terms

…ip ci]

jlchan · 2021-06-14T12:01:00Z

That blog post is very informative - thanks for sharing! The @muladd macro is new to me.

Two minor typos in the blog post: "Divisions are more expansive on modern hardhare than multiplications."

ranocha · 2021-06-14T14:38:26Z

That blog post is very informative - thanks for sharing! The @muladd macro is new to me.

Two minor typos in the blog post: "Divisions are more expansive on modern hardhare than multiplications."

Thanks 👍

codecov · 2021-06-14T16:28:59Z

Codecov Report

Merging #643 (813561b) into main (48fb4b9) will decrease coverage by 0.05%.
The diff coverage is 96.37%.

❗ Current head 813561b differs from pull request most recent head f638268. Consider uploading reports for the commit f638268 to get more accurate results

@@            Coverage Diff             @@
##             main     #643      +/-   ##
==========================================
- Coverage   93.69%   93.63%   -0.06%     
==========================================
  Files         171      171              
  Lines       16600    16595       -5     
==========================================
- Hits        15553    15539      -14     
- Misses       1047     1056       +9

Flag	Coverage Δ
unittests	`93.63% <96.37%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
examples/structured_2d_dgsem/elixir_euler_ec.jl	`100.00% <ø> (ø)`
examples/structured_3d_dgsem/elixir_euler_ec.jl	`100.00% <ø> (ø)`
src/Trixi.jl	`83.33% <ø> (ø)`
src/auxiliary/auxiliary.jl	`90.00% <ø> (ø)`
src/auxiliary/containers.jl	`93.89% <ø> (ø)`
src/auxiliary/mpi.jl	`75.75% <ø> (ø)`
src/auxiliary/precompile.jl	`0.00% <ø> (ø)`
src/auxiliary/special_elixirs.jl	`94.31% <ø> (ø)`
src/basic_types.jl	`100.00% <ø> (ø)`
src/callbacks_stage/positivity_zhang_shu.jl	`87.50% <ø> (ø)`
... and 118 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48fb4b9...f638268. Read the comment docs.

sloede

Great job in doing all these performance optimizations, including the two blog posts! I have some questions and remarks, some of which might be challenging ;-) In this case, it we'll have to see how far we get here or if we should discuss it in person...

examples/2d/elixir_euler_unstructured_quad_ec.jl

src/auxiliary/auxiliary.jl

src/auxiliary/math.jl

src/equations/compressible_euler_2d.jl

src/equations/ideal_glm_mhd_3d.jl

src/solvers/dg_tree/dg_2d.jl

src/solvers/dg_unstructured_quad/dg_2d.jl

ranocha · 2021-06-18T11:12:10Z

Once thing I forgot in my review: Instead of changing each file individually, would it be possible to also just wrap more than a single file in a begin ... end block? Or replace include(...) by includefast(...) that does it automatically for you? This would make it less verbose and easier to change in a single place should the need arise to do so in the future.

I thought you would prefer the more explicit solution having a comment in each file. Nevertheless, here you are: include_optimized

This version makes it considerably more difficult to work on Trixi due to a bug in Revise (timholy/Revise.jl#634). Hence, I strongly prefer the more verbose option to annotate each file until that bug is fixed. Is that okay for you, @sloede?

ranocha · 2021-06-24T04:35:28Z

Once thing I forgot in my review: Instead of changing each file individually, would it be possible to also just wrap more than a single file in a begin ... end block? Or replace include(...) by includefast(...) that does it automatically for you? This would make it less verbose and easier to change in a single place should the need arise to do so in the future.

I thought you would prefer the more explicit solution having a comment in each file. Nevertheless, here you are: include_optimized

This version makes it considerably more difficult to work on Trixi due to a bug in Revise (timholy/Revise.jl#634). Hence, I strongly prefer the more verbose option to annotate each file until that bug is fixed. Is that okay for you, @sloede?

@sloede: Can we also put this on the agenda to be able to finish this PR next week?

sloede · 2021-06-24T04:44:08Z

Once thing I forgot in my review: Instead of changing each file individually, would it be possible to also just wrap more than a single file in a begin ... end block? Or replace include(...) by includefast(...) that does it automatically for you? This would make it less verbose and easier to change in a single place should the need arise to do so in the future.

I thought you would prefer the more explicit solution having a comment in each file. Nevertheless, here you are: include_optimized

This version makes it considerably more difficult to work on Trixi due to a bug in Revise (timholy/Revise.jl#634). Hence, I strongly prefer the more verbose option to annotate each file until that bug is fixed. Is that okay for you, @sloede?

@sloede: Can we also put this on the agenda to be able to finish this PR next week?

No need. I propose to proceed with your original version but to open an issue that tracks the related Revise issue. I really like the include_optimized trick you created, and I'd strongly prefer this to having the explicit code everywhere once Revise is fixed.

ranocha · 2021-06-28T13:03:55Z

New benchmark from Rocinante

1 Thread: Nice runtime improvements overall (up to ca. 2x), only minor memory regressions

## Job Properties * Time of benchmarks: - Target: 28 Jun 2021 - 13:39 - Baseline: 28 Jun 2021 - 13:59 * Package commits: - Target: cae2e2 - Baseline: a8d5f9 * Julia commits: - Target: 6aaede - Baseline: 6aaede * Julia command flags: - Target: `-Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1` - Baseline: `-Cnative,-J/mnt/hd1/opt/julia/1.6.1/lib/julia/sys.so,-g1,--check-bounds=no,--threads=1` * Environment variables: - Target: None - Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID	time ratio	memory ratio
`["benchmark/elixir_2d_euler_vortex_structured.jl", "p3_rhs!"]`	0.93 (5%) ✅	1.05 (1%) ❌
`["benchmark/elixir_2d_euler_vortex_structured.jl", "p7_rhs!"]`	0.89 (5%) ✅	1.04 (1%) ❌
`["benchmark/elixir_2d_euler_vortex_tree.jl", "p3_rhs!"]`	0.93 (5%) ✅	1.05 (1%) ❌
`["benchmark/elixir_2d_euler_vortex_tree.jl", "p7_rhs!"]`	0.91 (5%) ✅	1.04 (1%) ❌
`["benchmark/elixir_2d_euler_vortex_unstructured.jl", "p3_rhs!"]`	0.94 (5%) ✅	1.04 (1%) ❌
`["benchmark/elixir_2d_euler_vortex_unstructured.jl", "p7_rhs!"]`	0.90 (5%) ✅	1.03 (1%) ❌
`["structured_2d_dgsem/elixir_euler_ec.jl", "p3_rhs!"]`	0.53 (5%) ✅	1.05 (1%) ❌
`["structured_2d_dgsem/elixir_euler_ec.jl", "p7_rhs!"]`	0.52 (5%) ✅	1.04 (1%) ❌
`["structured_2d_dgsem/elixir_euler_source_terms_nonperiodic.jl", "p3_rhs!"]`	0.95 (5%)	1.05 (1%) ❌
`["structured_2d_dgsem/elixir_euler_source_terms_nonperiodic.jl", "p7_rhs!"]`	0.92 (5%) ✅	1.04 (1%) ❌
`["structured_2d_dgsem/elixir_mhd_ec.jl", "p3_rhs!"]`	0.90 (5%) ✅	1.00 (1%)
`["structured_2d_dgsem/elixir_mhd_ec.jl", "p7_rhs!"]`	0.94 (5%) ✅	1.00 (1%)
`["structured_3d_dgsem/elixir_euler_ec.jl", "p3_rhs!"]`	0.51 (5%) ✅	1.05 (1%) ❌
`["structured_3d_dgsem/elixir_euler_ec.jl", "p7_rhs!"]`	0.53 (5%) ✅	1.04 (1%) ❌
`["structured_3d_dgsem/elixir_euler_source_terms_nonperiodic.jl", "p3_rhs!"]`	0.96 (5%)	1.05 (1%) ❌
`["structured_3d_dgsem/elixir_euler_source_terms_nonperiodic.jl", "p7_rhs!"]`	0.93 (5%) ✅	1.04 (1%) ❌
`["structured_3d_dgsem/elixir_mhd_ec.jl", "p3_rhs!"]`	0.91 (5%) ✅	1.00 (1%)
`["structured_3d_dgsem/elixir_mhd_ec.jl", "p7_rhs!"]`	0.89 (5%) ✅	1.00 (1%)
`["tree_2d_dgsem/elixir_advection_amr_nonperiodic.jl", "p3_analysis"]`	0.95 (5%) ✅	1.00 (1%)
`["tree_2d_dgsem/elixir_euler_ec.jl", "p3_rhs!"]`	0.61 (5%) ✅	1.05 (1%) ❌
`["tree_2d_dgsem/elixir_euler_ec.jl", "p7_rhs!"]`	0.58 (5%) ✅	1.04 (1%) ❌
`["tree_2d_dgsem/elixir_euler_vortex_mortar.jl", "p3_rhs!"]`	0.92 (5%) ✅	1.05 (1%) ❌
`["tree_2d_dgsem/elixir_euler_vortex_mortar.jl", "p7_rhs!"]`	0.90 (5%) ✅	1.04 (1%) ❌
`["tree_2d_dgsem/elixir_euler_vortex_mortar_shockcapturing.jl", "p3_rhs!"]`	0.86 (5%) ✅	1.03 (1%) ❌
`["tree_2d_dgsem/elixir_euler_vortex_mortar_shockcapturing.jl", "p7_rhs!"]`	0.80 (5%) ✅	1.03 (1%) ❌
`["tree_2d_dgsem/elixir_mhd_ec.jl", "p3_rhs!"]`	0.78 (5%) ✅	1.00 (1%)
`["tree_2d_dgsem/elixir_mhd_ec.jl", "p7_rhs!"]`	0.74 (5%) ✅	1.00 (1%)
`["tree_3d_dgsem/elixir_euler_ec.jl", "p3_analysis"]`	1.01 (5%)	1.03 (1%) ❌
`["tree_3d_dgsem/elixir_euler_ec.jl", "p3_rhs!"]`	0.66 (5%) ✅	1.04 (1%) ❌
`["tree_3d_dgsem/elixir_euler_ec.jl", "p7_analysis"]`	1.00 (5%)	1.02 (1%) ❌
`["tree_3d_dgsem/elixir_euler_ec.jl", "p7_rhs!"]`	0.64 (5%) ✅	1.03 (1%) ❌
`["tree_3d_dgsem/elixir_euler_mortar.jl", "p3_analysis"]`	0.98 (5%)	1.03 (1%) ❌
`["tree_3d_dgsem/elixir_euler_mortar.jl", "p3_rhs!"]`	0.95 (5%)	1.04 (1%) ❌
`["tree_3d_dgsem/elixir_euler_mortar.jl", "p7_analysis"]`	0.98 (5%)	1.02 (1%) ❌
`["tree_3d_dgsem/elixir_euler_mortar.jl", "p7_rhs!"]`	0.97 (5%)	1.03 (1%) ❌
`["tree_3d_dgsem/elixir_euler_shockcapturing.jl", "p3_analysis"]`	1.00 (5%)	1.02 (1%) ❌
`["tree_3d_dgsem/elixir_euler_shockcapturing.jl", "p3_rhs!"]`	0.67 (5%) ✅	1.03 (1%) ❌
`["tree_3d_dgsem/elixir_euler_shockcapturing.jl", "p7_analysis"]`	0.99 (5%)	1.02 (1%) ❌
`["tree_3d_dgsem/elixir_euler_shockcapturing.jl", "p7_rhs!"]`	0.64 (5%) ✅	1.03 (1%) ❌
`["tree_3d_dgsem/elixir_mhd_ec.jl", "p3_rhs!"]`	0.80 (5%) ✅	1.00 (1%)
`["tree_3d_dgsem/elixir_mhd_ec.jl", "p7_rhs!"]`	0.75 (5%) ✅	1.00 (1%)
`["unstructured_2d_dgsem/elixir_euler_wall_bc.jl", "p3_analysis"]`	0.93 (5%) ✅	1.00 (1%)
`["unstructured_2d_dgsem/elixir_euler_wall_bc.jl", "p3_rhs!"]`	0.95 (5%) ✅	1.04 (1%) ❌
`["unstructured_2d_dgsem/elixir_euler_wall_bc.jl", "p7_rhs!"]`	0.90 (5%) ✅	1.03 (1%) ❌

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

["benchmark/elixir_2d_euler_vortex_structured.jl"]
["benchmark/elixir_2d_euler_vortex_tree.jl"]
["benchmark/elixir_2d_euler_vortex_unstructured.jl"]
["latency"]
["p4est_2d_dgsem/elixir_advection_extended.jl"]
["p4est_3d_dgsem/elixir_advection_basic.jl"]
["structured_2d_dgsem/elixir_advection_extended.jl"]
["structured_2d_dgsem/elixir_advection_nonperiodic.jl"]
["structured_2d_dgsem/elixir_euler_ec.jl"]
["structured_2d_dgsem/elixir_euler_source_terms_nonperiodic.jl"]
["structured_2d_dgsem/elixir_mhd_ec.jl"]
["structured_3d_dgsem/elixir_advection_nonperiodic.jl"]
["structured_3d_dgsem/elixir_euler_ec.jl"]
["structured_3d_dgsem/elixir_euler_source_terms_nonperiodic.jl"]
["structured_3d_dgsem/elixir_mhd_ec.jl"]
["tree_2d_dgsem/elixir_advection_amr_nonperiodic.jl"]
["tree_2d_dgsem/elixir_advection_extended.jl"]
["tree_2d_dgsem/elixir_euler_ec.jl"]
["tree_2d_dgsem/elixir_euler_vortex_mortar.jl"]
["tree_2d_dgsem/elixir_euler_vortex_mortar_shockcapturing.jl"]
["tree_2d_dgsem/elixir_mhd_ec.jl"]
["tree_3d_dgsem/elixir_advection_extended.jl"]
["tree_3d_dgsem/elixir_euler_ec.jl"]
["tree_3d_dgsem/elixir_euler_mortar.jl"]
["tree_3d_dgsem/elixir_euler_shockcapturing.jl"]
["tree_3d_dgsem/elixir_mhd_ec.jl"]
["unstructured_2d_dgsem/elixir_euler_wall_bc.jl"]

Julia versioninfo

Target

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  4030 MHz  143325542 s       9068 s     641775 s  8402251998 s          0 s
       
  Memory: 251.6334342956543 GB (1520.875 MB free)
  Uptime: 6.67736e6 sec
  Load Avg:  2.0  2.05  2.16
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

Baseline

Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.2 LTS
  uname: Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 x86_64
  CPU: AMD Ryzen Threadripper 3990X 64-Core Processor: 
                  speed         user         nice          sys         idle          irq
       #1-128  2176 MHz  143343277 s       9068 s     642286 s  8403796597 s          0 s
       
  Memory: 251.6334342956543 GB (9770.75 MB free)
  Uptime: 6.678581e6 sec
  Load Avg:  1.0  1.13  1.54
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, znver2)

ranocha · 2021-06-29T16:12:25Z

@sloede: This PR should be ready for the final review

sloede

It's essentially only two questions: Greek letters vs. consistency and some additional comments. Otherwise it LGTM!

src/equations/compressible_euler_1d.jl

src/equations/compressible_euler_2d.jl

src/equations/compressible_euler_1d.jl

src/equations/compressible_euler_2d.jl

src/equations/compressible_euler_3d.jl

src/equations/ideal_glm_mhd_1d.jl

src/equations/ideal_glm_mhd_2d.jl

src/equations/ideal_glm_mhd_3d.jl

src/solvers/dgsem_tree/dg.jl

sloede

LGTM! Thanks and kudos for the great performance improvements! And thanks for the patience!

ranocha · 2021-06-30T04:17:07Z

Well, thank you for your patience with me and your helpful review! This really improved the quality of this PR 👍

ranocha added 12 commits June 14, 2021 08:28

use at-muladd for ln_mean

7b4b735

use at-inline for ln_mean

67a628e

skip computation of the diagonal terms in the split_form_kernel! with…

4648d00

…out non-conservative terms

use FMA-enabled add_to_node_vars! in split_form_kernel!

d115bd1

at-muladd and new add_to_node_vars! in more volume terms

427b692

more at-muladd

ad607cc

improve ln_mean further by avoiding one division

cc4e651

use at-evalpoly in ln_mean

442dd4b

introduce inv_ln_mean

ac39515

introduce inv_γm1 for compressible Euler equations

c54c8ac

update link to blog post

3052dc5

adapt some tolerances to the new setup

6a46c84

ranocha force-pushed the hr/blog_ec_performance branch from dec9ae8 to 61ca617 Compare June 14, 2021 09:50

ranocha added 3 commits June 14, 2021 12:48

update MHD

434e692

flux_ranocha with normal_direction

4b1cc92

adapt test tolerances for Mac OS

12be336

ranocha force-pushed the hr/blog_ec_performance branch from 61ca617 to 12be336 Compare June 14, 2021 10:51

add Gregor to reference for flux_hindenlang as discussed with him [sk…

f277fcd

…ip ci]

ranocha added 2 commits June 14, 2021 16:40

use the same EC flux for Euler EC tests on all meshes

74f0003

mention blog post in docs

656db43

ranocha requested a review from sloede June 14, 2021 14:43

ranocha marked this pull request as ready for review June 14, 2021 14:43

adapt parallel tolerance to 74f0003

ffc212f

ranocha added 3 commits June 14, 2021 18:50

further optimization of inv_rho_p_mean

028976a

Merge branch 'main' into hr/blog_ec_performance

f8ff778

Merge branch 'main' into hr/blog_ec_performance

9d5f331

sloede requested changes Jun 17, 2021

View reviewed changes

Merge branch 'main' into hr/blog_ec_performance

8c8dac6

ranocha added performance We are greedy triage To be decided at the next Trixi meeting labels Jun 23, 2021

ranocha mentioned this pull request Jun 24, 2021

Switch to include_optimized once Revise handles the 2-argument form of include #664

Open

ranocha added 2 commits June 24, 2021 06:58

include_optimized -> include plus macro call in files

4a7c8f7

Merge branch 'main' into hr/blog_ec_performance

d84388c

ranocha mentioned this pull request Jun 25, 2021

Weak form modal DG methods on simplicial (and quad/hex meshes) #647

Merged

4 tasks

ranocha added 6 commits June 27, 2021 12:51

inv_γm1 -> inv_gamma_minus_1

6e8d10f

Merge branch 'main' into hr/blog_ec_performance

24dc161

Merge branch 'main' into hr/blog_ec_performance

8f22329

Merge branch 'main' into hr/blog_ec_performance

6283691

Merge branch 'main' into hr/blog_ec_performance

5670197

Merge branch 'main' into hr/blog_ec_performance [skip ci]

cae2e29

ranocha added 3 commits June 28, 2021 15:06

Merge branch 'main' into hr/blog_ec_performance

1961bca

Merge branch 'main' into hr/blog_ec_performance [skip ci]

1d1920d

inv_gamma_minus_1 -> inv_gamma_minus_one

813561b

ranocha removed the triage To be decided at the next Trixi meeting label Jun 29, 2021

sloede requested changes Jun 29, 2021

View reviewed changes

ranocha added 3 commits June 30, 2021 05:59

comment on multiply_add_to_node_vars!

a9babe2

Merge branch 'main' into hr/blog_ec_performance

e7aa60a

long live ASCII

f638268

ranocha requested a review from sloede June 30, 2021 04:12

sloede approved these changes Jun 30, 2021

View reviewed changes

ranocha merged commit 83bd625 into trixi-framework:main Jun 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve EC methods (focusing on the compressible Euler equations) #643

Improve EC methods (focusing on the compressible Euler equations) #643

ranocha commented Jun 14, 2021 •

edited

Loading

jlchan commented Jun 14, 2021 •

edited

Loading

ranocha commented Jun 14, 2021

codecov bot commented Jun 14, 2021 •

edited

Loading

sloede left a comment

ranocha commented Jun 18, 2021

ranocha commented Jun 24, 2021

sloede commented Jun 24, 2021

ranocha commented Jun 28, 2021 •

edited

Loading

Results

Benchmark Group List

Julia versioninfo

Target

Baseline

ranocha commented Jun 29, 2021

sloede left a comment

sloede left a comment

ranocha commented Jun 30, 2021

Improve EC methods (focusing on the compressible Euler equations) #643

Improve EC methods (focusing on the compressible Euler equations) #643

Conversation

ranocha commented Jun 14, 2021 • edited Loading

Job Properties

Results

Benchmark Group List

Julia versioninfo

Target

Baseline

Job Properties

Results

Benchmark Group List

Julia versioninfo

Target

Baseline

jlchan commented Jun 14, 2021 • edited Loading

ranocha commented Jun 14, 2021

codecov bot commented Jun 14, 2021 • edited Loading

Codecov Report

sloede left a comment

Choose a reason for hiding this comment

ranocha commented Jun 18, 2021

ranocha commented Jun 24, 2021

sloede commented Jun 24, 2021

ranocha commented Jun 28, 2021 • edited Loading

Results

Benchmark Group List

Julia versioninfo

Target

Baseline

ranocha commented Jun 29, 2021

sloede left a comment

Choose a reason for hiding this comment

sloede left a comment

Choose a reason for hiding this comment

ranocha commented Jun 30, 2021

ranocha commented Jun 14, 2021 •

edited

Loading

jlchan commented Jun 14, 2021 •

edited

Loading

codecov bot commented Jun 14, 2021 •

edited

Loading

ranocha commented Jun 28, 2021 •

edited

Loading