correct timing stack reordering in am_now_working_on and add separate timing category for or_to_all #952

oskooi · 2019-07-07T03:40:08Z

Two improvements to the timing statistics feature:

corrects a bug in am_now_working_on by reversing the order (going from top to bottom) in which the timing stack was_working_on was being reshuffled upon the addition of a new entry at the bottom (i.e. 0th element)
adds a separate timing category (MpiTime) for two separate instances of or_to_all; one of these (in phase_material) was causing synchronization during the time stepping which was mixing up the results.

With these changes, the timing statistics for two examples involving a tall, skinny, 2d cell with a FluxRegion and a Lorentzian material are now as expected: when run with 3 processors/chunks, the "waiting" time for two of the three chunks (which do not contain the additional expensive pixels) are contained in the communicating: bin rather than in time stepping: as was reported previously.

tall, skinny cell with FluxRegion

import meep as mp

sx = 30
sy = 5
fcen = 1.0
df = 0.1
nfreq = 500

src = mp.Source(mp.GaussianSource(fcen,fwidth=df),
                component=mp.Ez,
                center=mp.Vector3())

sim = mp.Simulation(cell_size=mp.Vector3(sx,sy),
                    sources=[src],
                    resolution=20,
                    verbose=True,
                    k_point=mp.Vector3(0.53,0.14,0),
                    split_chunks_evenly=True)

flux = sim.add_flux(fcen,
                    df,
                    nfreq,
                    mp.FluxRegion(center=mp.Vector3(10),size=mp.Vector3(8,sy),direction=mp.X))
                                                                                                                                               
sim.run(until=20)

output for 3 processors/chunks

Elapsed run time = 43.4122 s

Field time usage:
        connecting chunks: 0.00273964 s +/- 0.000174768 s
            time stepping: 0.177547 s +/- 0.00483149 s
            communicating: 28.8031 s +/- 24.8935 s
     Fourier transforming: 14.3563 s +/- 24.8657 s
          everything else: 0.0578607 s +/- 0.0643532 s


Field time usage for all processes:
        connecting chunks: 0.00261171, 0.00293877, 0.00266845
            time stepping: 0.174122, 0.183073, 0.175445
          copying borders: 0, 0, 0
            communicating: 43.1798, 43.1709, 0.0585769
        outputting fields: 0, 0, 0
     Fourier transforming: 6.2939e-05, 6.1177e-05, 43.0687
                      MPB: 0, 0, 0
        getting farfields: 0, 0, 0
          everything else: 0.0172884, 0.0242326, 0.132061

tall, skinny cell with Lorentzian material

import meep as mp
from meep.materials import Al

sx = 30
sy = 5
fcen = 1.5
df = 0.1

src = [mp.Source(mp.GaussianSource(fcen,fwidth=df),
                 component=mp.Ez,
                 center=mp.Vector3())]

geometry = [mp.Block(material=Al,
                     center=mp.Vector3(10),
                     size=mp.Vector3(8,mp.inf,mp.inf))]

sim = mp.Simulation(cell_size=mp.Vector3(sx,sy),
                    sources=src,
                    geometry=geometry,
                    resolution=53,
                    verbose=True,
                    k_point=mp.Vector3(),
                    split_chunks_evenly=True)

sim.run(until=50)

output for 3 processors/chunks

Elapsed run time = 21.1063 s

Field time usage:
        connecting chunks: 0.0139099 s +/- 0.000110555 s
            time stepping: 13.6499 s +/- 4.36228 s
            communicating: 6.68182 s +/- 4.37074 s
     Fourier transforming: 0.000973668 s +/- 0.000562319 s
          everything else: 0.163483 s +/- 0.00799958 s


Field time usage for all processes:
        connecting chunks: 0.0140033, 0.0137878, 0.0139384
            time stepping: 11.1444, 11.1182, 18.687
          copying borders: 0, 0, 0
            communicating: 9.19343, 9.21707, 1.63494
        outputting fields: 0, 0, 0
     Fourier transforming: 0.000537213, 0.000775563, 0.00160823
                      MPB: 0, 0, 0
        getting farfields: 0, 0, 0
          everything else: 0.157655, 0.160192, 0.172604

…timing MPI category for or_to_all

stevengj · 2019-07-08T20:41:15Z

src/time.cpp

-  for (int i = 0; i + 1 < MEEP_TIMING_STACK_SZ; ++i)
-    was_working_on[i + 1] = was_working_on[i];
+  for (int i = MEEP_TIMING_STACK_SZ - 1; i > 0; --i)
+    was_working_on[i] = was_working_on[i - 1];


good catch … seems to have been my fault from 2009 (68e20bb)

stevengj · 2019-07-08T20:45:54Z

src/step.cpp

+  am_now_working_on(MpiTime);
+  bool changed_mpi = or_to_all(changed);
+  finished_working();
+  if (changed_mpi) {


It seems like all of this should be inside the if (is_phasing()) { .... } check. That is, nothing should be executed in the common case of people who aren't using this feature.

Fixed by ea21771.

… timing category for or_to_all (NanoComp#952) * correct timing stack reshuffle in am_now_working_on and add separate timing MPI category for or_to_all * move time-stepping block inside is_phasing if statement

correct timing stack reshuffle in am_now_working_on and add separate …

13e1a3e

…timing MPI category for or_to_all

stevengj reviewed Jul 8, 2019

View reviewed changes

move time-stepping block inside is_phasing if statement

ea21771

stevengj merged commit 388e26b into NanoComp:master Jul 9, 2019

oskooi deleted the timing_fixes branch July 9, 2019 18:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

correct timing stack reordering in am_now_working_on and add separate timing category for or_to_all #952

correct timing stack reordering in am_now_working_on and add separate timing category for or_to_all #952

oskooi commented Jul 7, 2019

stevengj Jul 8, 2019

stevengj Jul 8, 2019

oskooi Jul 8, 2019

correct timing stack reordering in am_now_working_on and add separate timing category for or_to_all #952

correct timing stack reordering in am_now_working_on and add separate timing category for or_to_all #952

Conversation

oskooi commented Jul 7, 2019

stevengj Jul 8, 2019

Choose a reason for hiding this comment

stevengj Jul 8, 2019

Choose a reason for hiding this comment

oskooi Jul 8, 2019

Choose a reason for hiding this comment