Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correct timing stack reordering in am_now_working_on and add separate timing category for or_to_all #952

Merged
merged 2 commits into from
Jul 9, 2019

Conversation

oskooi
Copy link
Collaborator

@oskooi oskooi commented Jul 7, 2019

Two improvements to the timing statistics feature:

  1. corrects a bug in am_now_working_on by reversing the order (going from top to bottom) in which the timing stack was_working_on was being reshuffled upon the addition of a new entry at the bottom (i.e. 0th element)
  2. adds a separate timing category (MpiTime) for two separate instances of or_to_all; one of these (in phase_material) was causing synchronization during the time stepping which was mixing up the results.

With these changes, the timing statistics for two examples involving a tall, skinny, 2d cell with a FluxRegion and a Lorentzian material are now as expected: when run with 3 processors/chunks, the "waiting" time for two of the three chunks (which do not contain the additional expensive pixels) are contained in the communicating: bin rather than in time stepping: as was reported previously.

tall, skinny cell with FluxRegion

import meep as mp

sx = 30
sy = 5
fcen = 1.0
df = 0.1
nfreq = 500

src = mp.Source(mp.GaussianSource(fcen,fwidth=df),
                component=mp.Ez,
                center=mp.Vector3())

sim = mp.Simulation(cell_size=mp.Vector3(sx,sy),
                    sources=[src],
                    resolution=20,
                    verbose=True,
                    k_point=mp.Vector3(0.53,0.14,0),
                    split_chunks_evenly=True)

flux = sim.add_flux(fcen,
                    df,
                    nfreq,
                    mp.FluxRegion(center=mp.Vector3(10),size=mp.Vector3(8,sy),direction=mp.X))
                                                                                                                                               
sim.run(until=20)

output for 3 processors/chunks

Elapsed run time = 43.4122 s

Field time usage:
        connecting chunks: 0.00273964 s +/- 0.000174768 s
            time stepping: 0.177547 s +/- 0.00483149 s
            communicating: 28.8031 s +/- 24.8935 s
     Fourier transforming: 14.3563 s +/- 24.8657 s
          everything else: 0.0578607 s +/- 0.0643532 s


Field time usage for all processes:
        connecting chunks: 0.00261171, 0.00293877, 0.00266845
            time stepping: 0.174122, 0.183073, 0.175445
          copying borders: 0, 0, 0
            communicating: 43.1798, 43.1709, 0.0585769
        outputting fields: 0, 0, 0
     Fourier transforming: 6.2939e-05, 6.1177e-05, 43.0687
                      MPB: 0, 0, 0
        getting farfields: 0, 0, 0
          everything else: 0.0172884, 0.0242326, 0.132061

tall, skinny cell with Lorentzian material

import meep as mp
from meep.materials import Al

sx = 30
sy = 5
fcen = 1.5
df = 0.1

src = [mp.Source(mp.GaussianSource(fcen,fwidth=df),
                 component=mp.Ez,
                 center=mp.Vector3())]

geometry = [mp.Block(material=Al,
                     center=mp.Vector3(10),
                     size=mp.Vector3(8,mp.inf,mp.inf))]

sim = mp.Simulation(cell_size=mp.Vector3(sx,sy),
                    sources=src,
                    geometry=geometry,
                    resolution=53,
                    verbose=True,
                    k_point=mp.Vector3(),
                    split_chunks_evenly=True)

sim.run(until=50)

output for 3 processors/chunks

Elapsed run time = 21.1063 s

Field time usage:
        connecting chunks: 0.0139099 s +/- 0.000110555 s
            time stepping: 13.6499 s +/- 4.36228 s
            communicating: 6.68182 s +/- 4.37074 s
     Fourier transforming: 0.000973668 s +/- 0.000562319 s
          everything else: 0.163483 s +/- 0.00799958 s


Field time usage for all processes:
        connecting chunks: 0.0140033, 0.0137878, 0.0139384
            time stepping: 11.1444, 11.1182, 18.687
          copying borders: 0, 0, 0
            communicating: 9.19343, 9.21707, 1.63494
        outputting fields: 0, 0, 0
     Fourier transforming: 0.000537213, 0.000775563, 0.00160823
                      MPB: 0, 0, 0
        getting farfields: 0, 0, 0
          everything else: 0.157655, 0.160192, 0.172604

for (int i = 0; i + 1 < MEEP_TIMING_STACK_SZ; ++i)
was_working_on[i + 1] = was_working_on[i];
for (int i = MEEP_TIMING_STACK_SZ - 1; i > 0; --i)
was_working_on[i] = was_working_on[i - 1];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch … seems to have been my fault from 2009 (68e20bb)

src/step.cpp Outdated
am_now_working_on(MpiTime);
bool changed_mpi = or_to_all(changed);
finished_working();
if (changed_mpi) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like all of this should be inside the if (is_phasing()) { .... } check. That is, nothing should be executed in the common case of people who aren't using this feature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by ea21771.

@stevengj stevengj merged commit 388e26b into NanoComp:master Jul 9, 2019
@oskooi oskooi deleted the timing_fixes branch July 9, 2019 18:17
bencbartlett pushed a commit to bencbartlett/meep that referenced this pull request Sep 9, 2021
… timing category for or_to_all (NanoComp#952)

* correct timing stack reshuffle in am_now_working_on and add separate timing MPI category for or_to_all

* move time-stepping block inside is_phasing if statement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants