-
Notifications
You must be signed in to change notification settings - Fork 115
Description
What happened?
For full context of this issue, refer to the summary in #396.
Custodian might end up running on a master node with the VASP processes being launched on sister nodes. This is often done, for instance, when requesting a single large Slurm allocation and running many concurrent VASP processes therein. Currently, Custodian cannot handle this setup, as the Custodian process on the master node seemingly does not have permission to kill the VASP process on the other node(s) in the allocation, and it then defaults to a
killallcommand killing everything (including perfectly fine jobs). However, Custodian does have permission to kill the parent process that launches the VASP executable (typically ansrunormpiruncall), which in fact is what thekillallindiscriminately kills.
#396 solves this for VASP, but essentially the same problem exists for the other codes. The fix in #396 is quite easy to implement for other codes once it is merged.
Version
2025.8.13
Which OS?
- MacOS
- Windows
- Linux