-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fbhuiyan2 -- adding Sophia to default configs #386
base: main
Are you sure you want to change the base?
Conversation
Hi @fbhuiyan2, sorry it's taken so long to address this PR. One question, when you say you've tested in the by-gpu queue, can you clarify what you've tested? |
Yes, sure. I have tested running the Python job from your Balsam workshop. Moreover, I have been running LAMMPS and VASP calculations on Sophia using Balsam. I have not run any LAMMPS calculations using 'by-gpu' node, but I have run my VASP app using 'by-gpu' node. VASP jobs ran just fine. I found out that node packing also works. Initially, I assumed that each gpu in the 'by-gpu' node would be a 'node' for node packing purposes. But it turned out to be wrong, nodes are still actual nodes. To keep things simple, node_packing_count = 1 should be used for 'by-gpu'. If higher node packing is used, like node_packing_count = 4 with n_gpus=2, then the following error can occur if you do not ask for or get 8 gpus in the same node:
Here, I asked for 4 gpus in Balsam queue with node packing = 4 and n_gpus = 2. Balsam tried to pack 4 calculations in the 4 gpus but only 2 jobs could fit, the other 2 threw out this error. |
Added apprun and compute node for Sophia. Added Sophia as a default config with appropriate settings.yml and job_sample.sh script. Tested the configuration on Sophia. With the changes added, Balsam shows Sophia as an option when opening new sites. Tested running jobs on 'by-gpu' and 'by-node' queues, worked as expected. Further testing maybe needed to make sure node packing is working as expected. Hyperthreading is not enabled/added here, but can be added later.