How to run parallel job's with srun
srun
: Used to submit/initiate job or job step
Typically, srun is invoked from a SLURM job script but alternatively, srun can be run directly from the command, in which case srun will first create a resource allocation for running the parallel job (the salloc is implicit)
srun -N 1 -c 16 -p HPC_4_Days --time=1:00:00 --pty /bin/bash
This command will request 16 cores (-c
) of one Node (-N
) for 1h00 in the partition (-p
) HPC_4_Days. Please note that this is subject to Nodes availability, if no Nodes are available your request will be put in the queue waiting for resources.
The srun may also be executed inside a shell script.
#!/bin/bash
#SBATCH -N 3
#SBATCH -p HPC_4_Days
echo Starting job $SLURM_JOB_ID
echo SLURM assigned me these nodes
srun -l hostname
This batch job will result in the following output:
Starting job 51057
SLURM assigned me these nodes
0: wn054.b.incd.pt
1: wn055.b.incd.pt
2: wn057.b.incd.pt
The 3 allocated nodes are released after the srun
finish.
By default srun will use the pmi2
, but you may consult the full list of the available mpi types.
$ srun --mpi=list
srun: MPI types are...
srun: pmi2
srun: openmpi
srun: none
To use a different mpi type e.g. srun --mpi=openmpi
For more detailed information, please see man srun