Packing sequential jobs

HPC cluster is a public resource and users have to be wise and careful in order to fully utilize its capacity. When one submits a set of single CPU jobs, after some time, they are distributed randomly on many compute nodes. As a result, user who needs whole compute node cannot get it, because one CPU core is in use. This also burdens and slows the scheduler. To prevent this situation you can arrange your sequential jobs in a way that they will occupy whole nodes. All you need to do is to execute multiple runs in the same job script. To do this please modify your submission script as follows. The example is for the 16-core node.

#PBS -N 
#PBS -l select=1:ncpus=16:mem=60gb
#PBS ...
...
time ./my_program01 [input/output/parameters] &
time ./my_program02 [input/output/parameters] &
...
time ./my_program16 [input/output/parameters] &
wait

There are some important points in the above example:

  • The number of programs you run has to be equal to the number of "ncpus" requested in the resource line of the script;
  • The symbol "&" is crucial, otherwise your program will not run simultaneously;
  • The command "wait" is also crucial. Without "wait" the job will finish and kill all the programs if one of them finishes earlier than others.