How to Run a Job with a GPU
Let's run the gravitational N-body simulation found on the CUDA toolkit samples on a GPU. This example is suited for a standard INCD user elegible to use the hpc and gpu partitions.
The fct partition and included resources is meant for users with a FCT grant and although the request of GPUs is made on the same way, they have specific instructions to follow found at FCT Calls.
The GPU's are only available at CIRRUS-A infrastruture on Lisbon.
Login on the user interface cirrus.ncg.ingrid.pt
$ ssh -l user cirrus.ncg.ingrid.pt [user@cirrus01 ~]$ _
Prepare your working directory
Prepare your environment on a specific directory in order to protect from inter job interferences and create a submission batch script:
[user@cirrus01 ~]$ mkdir myworkdir [user@cirrus01 ~]$ cd myworkdir [user@cirrus01 ~]$ cat nbody.sh #!/bin/bash #SBATCH --partition=gpu #SBATCH --gres=gpu #SBATCH --mem=8192MB #SBATCH --ntasks=1 COMMON=/usr/local/cuda/samples/common SAMPLE=/usr/local/cuda/samples/5_Simulations/nbody [ -d ../common ] || cp -r $COMMON .. [ -d nbody ] || cp -r $SAMPLE . module load cuda cd nbody make clean make if [ -e nbody ]; then chmod u+x nbody ./nbody -benchmark -numbodies=2560000 fi
In this example we copy the n-body CUDA toolkit sample simulation to the working directory, load cuda environment, build the simulation and run it. Please notice that FCT call grant users should request fct partition instead of gpu.
Requesting the partition
Standard INCD users
Standard INCD users at CIRRUS-A have access to the gpu partition providing NVIDIA Tesla-T4 GPUs. In order to access these GPUs request the gpu partition with directive:
FCT call users
The partition fct provide two types of NVidia Tesla: T4 and V100S. The FCT grant users should request the fct partition with directive:
Requesting the GPU
We request the allocation of one GPU NVidia Tesla-T4 throught the option:
To request the allocation of on GPU NVidia Tesla-V100S, only on fct partition and reserved to FCT call grant users, use the directive:
We can also omit the gpu type:
this way we ask for a GPU of any type. The partition gpu support four GPU's NVidia Tesla-T4, the GPU type can be omited since there are only one type of GPU on this partition.
The partition fct support two GPU's NVidia Tesla-T4 and two GPU's NVidia Tesla-V100S but are reserved for the FCT call grant users.
As a general rule and depending on the application, the two types of GPUs available on the cluster are very similar but the Tesla-V100S perform the same work in half the time when compared with the Tesla-V100S. Nevertheless, if you request a Tesla-V100S you may have to wait for resource availability until you have a free Tesla-V100S ready to go. If you only want a free GPU allocated for your job then the #SBATCH --grep=gpu form would be the best choice.
Ensure enough memory for your simulation, follow the tips on Determining Memory Requirements(page_to_be) page.
On our example 8GB is sufficient to run the simulation:
You should also plan the number of tasks in use, this will depend on the application, for our simulation we need only one task, or CPU:
we could ommit this directive, the default would be 1 task.
Submit the simulation
[user@cirrus01 ~]$ sbatch nbody.sh Submitted batch job 1176
Monitor your job
You can use the squeue command line tool
[user@cirrus01 ~]$ gqueue JOBID PARTITION NAME USER ST STATIME NODES CPUS TRES_PER_NODE NODELIST 1176 gpu nbody.sh user5 R RUN0-00:02:33 1 1 gpu:t4 hpc058
or use the command sacct, the job is completed when the State field mark is COMPLETED.
[user@cirrus01 ~]$ gacct JobID JobName Partition Account AllocCPUS ReqGRES AllocGRES State ExitCode ------------ ---------- ---------- ---------- ---------- ------------ ------------ ---------- -------- 1170 nbody.sh fct hpc 2 gpu:v100s:1 gpu:1 COMPLETED 0:0 1171 nbody.sh fct hpc 2 gpu:t4:1 gpu:1 COMPLETED 0:0 1175 teste.sh fct hpc 1 COMPLETED 0:0 1176 nbody.sh gpu hpc 1 gpu:1 gpu:1 COMPLETED 0:0
if the state is different from COMPLETED or RUNNING then check your simulation or request help throught the email address firstname.lastname@example.org providing the JOBID, the submission script, the relevant slurm output files, e.g. slurm-1176.out, or other remarks you think it may be helpfull
Check the results at job completion
[user@cirrus01 ~]$ ls -l -rw-r-----+ 1 user hpc 268 Oct 22 13:56 gpu.sh drwxr-x---+ 3 user hpc 4096 Oct 20 18:09 nbody -rw-r-----+ 1 user hpc 611 Oct 22 13:41 slurm-1176.out [user@cirrus01 ~]$ cat slurm-1176.out ... > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation GPU Device 0: "Turing" with compute capability 7.5 > Compute 7.5 CUDA device: [Tesla T4] number of bodies = 2560000 2560000 bodies, total time for 10 iterations: 308586.156 ms = 212.375 billion interactions per second = 4247.501 single-precision GFLOP/s at 20 flops per interaction