Job pipeline using slurm dependencies

Some times we need to launch a list of jobs that execute in sequence, one after another. In those cases we will use the --depency sbatch option, check the manual page for more details, we will only present a simple example.

Simple example

Suppose we need to submit the script my_first_job.sh and then mu_second_job.sh that should run after the first one:

[user@cirrus01 ~]$ sbatch my_first_job.sh
Submitted batch job 1843928

[user@cirrus01 ~]$ sbatch --dependency=after:1843928 my_second_job.sh
Submitted batch job 1843921

[user@cirrus01 ~]$ squeue
JOBID   PARTITION              NAME USER ST TIME NODES NODELIST(REASON) 
1843928       hpc   my_first_job.sh user  R 0:11     1 hpc046
1843921       hpc  my_second_job.sh user PD 0:00     1 hpc047

In this case the second job will run even if the first job fails for some reason. The pending job will execute when the first finish his execution.

Tipical example

On a real case we may need the ensure that a good termination of the first job, for example, the first job may produce some output file needed as input for the second job:

[user@cirrus01 ~]$ sbatch my_first_job.sh
Submitted batch job 1843922

[user@cirrus01 ~]$ sbatch --dependency=afterok:1843922 my_second_job.sh
Submitted batch job 1843923

The afterok parameter states that the second job would start only if the previous job terminate with no errors.

Complex cases

Check the sbatch manual page for more details:

 [user@cirrus01 ~]$ man sbatch

search for the -d, --dependency=<dependency_list> options explanation.

Slurm

Jobs information

My first slurm job

overview of the resources offered

show job accounting data

stop or cancel jobs

Show jobs information in queue

How to run parallel job's with srun

Preparing the Environment

Interactive Sessions

Job pipeline using slurm dependencies

Use of user QOS for CPU jobs

How to Run a Job with a GPU

Use QOS to run GPU jobs

Deep Learning Example

How to selected a GPU

My jobs need to run longer than the queues permit

Resource Consuption

Job pipeline using slurm dependencies

Simple example

Tipical example

Complex cases