-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New automated Checkpoint-Restart Tests #324
base: main
Are you sure you want to change the base?
Conversation
…-restart functionallity.
7b982aa
to
be93341
Compare
be93341
to
136c40a
Compare
…estart functionality (until PR enzo-project#313 is merged)
136c40a
to
d3d6a2e
Compare
Co-authored-by: Greg Bryan <[email protected]>
466a928
to
8cab0f5
Compare
Recommendations:
|
@jobordner: I've started to address your recommendations. I had quick question about this recommendation:
I have now renamed this file so that it's called This file exists because:
Would you prefer that I do something like:
|
(I assume you're referring to parameter include files (foo.incl) and not parameter test files (test-foo.in) here? Defining Stopping parameters is required for a simulation to stop).
IMHO duplicating code (or parameter files) should be avoided, so relocating the Testing and Stopping parameter groups from the parameter include file to the parameter test files would be preferred, despite complicating reviewers' work. Adding them via e.g. ppm-testing.incl and ppm-stopping.incl files would simplify both relocating and reviewing. |
…ld produce an error if the mask_list_ attribute was not empty. I didn't realize that it usually held a single nullptr)
1. The new approach that writes checkpoints with the ``"check"`` method, that promises a lot more flexibility. | ||
2. The legacy approach that uses Charm++'s machinery | ||
|
||
The intention is to entirely transition to the new approach. But, at this time the new approach has limitations (see `Issue #316 <https://github.com/enzo-project/enzo-e/issues/316>`_). For that reason, both approaches are automatically tested (by the pytest machinery). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is not to transition to the new approach. Each has its advantages and disadvantages, so both should be kept. Charm++ CR can do lots of things Cello CR cannot.
|
||
.. code-block:: bash | ||
|
||
python3 tools/ckpt_restart_test.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the following when I run this:
Test failed: NOT cleaning up
Traceback (most recent call last):
File "/home/bordner/Cello/enzo-e.mabruzzo.new-ckpt-restart-tests/tools/ckpt_restart_test.py", line 165, in <module>
main(parser.parse_args())
File "/home/bordner/Cello/enzo-e.mabruzzo.new-ckpt-restart-tests/tools/ckpt_restart_test.py", line 71, in main
run_ckpt_restart_test(nominal_input = args.input,
File "/home/bordner/Cello/enzo-e.mabruzzo.new-ckpt-restart-tests/test/answer_tests/test_utils/ckpt_restart_testing.py", line 464, in run_ckpt_restart_test
restart_h5obj_map = ckpt_block_file_map(
^^^^^^^^^^^^^^^^^^^^
File "/home/bordner/Cello/enzo-e.mabruzzo.new-ckpt-restart-tests/test/answer_tests/test_utils/ckpt_restart_testing.py", line 22, in ckpt_block_file_map
with open(filelist_path, 'r') as f_filelist:
^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'my_test/restart_run/Check-02/check.file_list'
This branch is 11 commits ahead of, 168 commits behind |
This PR introduces new automated Checkpoint-Restart tests that will be executed by the continuous integration as part of the pytest framework.
Overview
Prior to the migration from Scons to Cmake, we previously had some Checkpoint-Restart tests. Unfortunately, those tests got lost in the transition. At the same time, James has been hard at work developing/refining the new Scalable Checkpoint functionality.
It was initially my intention to just write tests for the new-style checkpoints. However, issue #316 has made it clear that the new-style checkpoints are not ready to completely replace the old-style checkpoints quite yet. For that reason, I have introduced tests of both types of Checkpoints.
I largely rewrote most of the code for the Checkpoint-Restart functionality (hopefully it's easier to digest than the older version of the code that is now removed).
test/answer_tests/test_utils/ckpt_restart_testing.py
test/answer_tests/test_ckpt_restart.py
.tools/ckpt_restart_test.py
, that can be used to help diagnose problems (it makes use of the exact same core-logic as the unit-tests).Essentially these tests execute the following logic:
The website-documentation has been updated for these checkpoint tests, and provides additional detail. If you still have questions, let me know and I can update it.
Notes for Reviewers:
test/answer_tests/test_ckpt_restart.py
and I will submit another PR after Texascale bug fixes #313 is merged (or update this one if it hasn't merged yet).EDIT: Now that #318 is merged, this PR no longer has any dependencies. The "Files Changed" tab now just includes changes from this PR (rather than from both PRs)