# GPU user guide



# How to Run a Job with a GPU

Let's run the gravitational N-body simulation found on the CUDA toolkit samples on a GPU. This example is suited for a standard *INCD* user elegible to use the ***hpc*** and ***gpu*** partitions.

><span style="color:teal">The ***fct*** partition and included resources is meant for users with a *FCT* grant and although the request of GPUs is made on the same way, they have specific instructions to follow found at [FCT Calls](https://wiki.incd.pt/books/user-support-information---rnca-fct-calls)

  ><span style="color:teal">The GPU's are only available at CIRRUS-A infrastruture on Lisbon.
  
### Login on the user interface cirrus.ncg.ingrid.pt

	$ ssh -l user cirrus.ncg.ingrid.pt
    [user@cirrus01 ~]$ _

### Prepare your working directory

Prepare your environment on a specific directory in order to protect from inter job interferences and create a submission batch script:
  *** only works for Cuda 10.2

	[user@cirrus01 ~]$ mkdir myworkdir
    [user@cirrus01 ~]$ cd myworkdir
    [user@cirrus01 ~]$ cat nbody.sh
    #!/bin/bash

	#SBATCH --partition=gpu
    #SBATCH --gres=gpu
    #SBATCH --mem=8192MB

	COMMON=/usr/local/cuda/samples/common
	SAMPLE=/usr/local/cuda/samples/5_Simulations/nbody

	[ -d ../common ] || cp -r $COMMON ..
	[ -d nbody     ] || cp -r $SAMPLE .
    
    module load cuda
	cd nbody
	make clean
	make

	if [ -e nbody ]; then
		chmod u+x nbody
		./nbody -benchmark -numbodies=2560000
	fi

In this example we copy the n-body CUDA toolkit sample simulation to the working directory, load cuda environment, build the simulation and run it.

#### Requesting the partition

Standard *INCD* users at *CIRRUS-A* have access to the ***gpu*** partition providing NVIDIA Tesla-T4 GPUs. In order to access these GPUs request the ***gpu*** partition with directive:

	#SBATCH --partition=gpu

The partition ***fct*** provide several types of NVIDIA: T4 and V100S (please check current resources available page). As a general rule and depending on the application, the types of GPUs available on the cluster are similar but the Tesla-V100S perform the same work in half the time when compared with the Tesla-T4. Nevertheless, if you request a Tesla-V100S you may have to wait for resource availability until you have a free Tesla-T4 ready to go. 
  
If you only want a free GPU allocated for your job then the **#SBATCH --grep=gpu*** form would be the best choice.

#### Requesting the Tesla-T4 GPU

We request the allocation of one GPU NVIDIA Tesla-T4 throught the option:

	#SBATCH --gres=gpu:t4

Standard *INCD* users can access only NVIDIA Tesla-T4 GPUs, so we can simplify the request:

	#SBATCH --gres=gpu

this way we ask for a GPU of any type, the same is valid on partitions with more than one type of GPU if we do not care about the type of allocated GPU to our job.

#### Requesting memory

Ensure enough memory for your simulation, follow the tips on *Determining Memory Requirements(page_to_be)* page.

On our example 8GB is sufficient to run the simulation:

	#SBATCH --mem=8192M

### Submit the simulation

	[user@cirrus01 ~]$ sbatch nbody.sh
	Submitted batch job 1176

### Monitor your job

You can use the ***squeue*** command line tool

	[user@cirrus01 ~]$ gqueue 
	JOBID PARTITION NAME      USER     ST STATIME       NODES CPUS TRES_PER_NODE NODELIST 
	1176  gpu       nbody.sh  user5    R  RUN0-00:02:33 1     1    gpu:t4        hpc058

or use the command ***sacct***, the job is completed when the *State* field mark is *COMPLETED*.

	[user@cirrus01 ~]$ gacct 
       JobID    JobName  Partition    Account  AllocCPUS      ReqGRES    AllocGRES      State ExitCode 
	------------ ---------- ---------- ---------- ---------- ------------ ------------ ---------- -------- 
	1170       nbody.sh        fct        hpc          2  gpu:v100s:1        gpu:1  COMPLETED      0:0 
	1171       nbody.sh        fct        hpc          2     gpu:t4:1        gpu:1  COMPLETED      0:0 
	1175       teste.sh        fct        hpc          1                            COMPLETED      0:0 
	1176       nbody.sh        gpu        hpc          1        gpu:1        gpu:1  COMPLETED      0:0

><span style="color:teal">if the state is different from *COMPLETED* or *RUNNING* then check your simulation or request help throught the email address **helpdesk@incd.pt** providing the *JOBID*, the submission script, the relevant slurm output files, e.g. *slurm-1176.out*, or other remarks you think it may be helpfull</spam>

### Check the results at job completion

	[user@cirrus01 ~]$ ls -l
    -rw-r-----+ 1 user hpc  268 Oct 22 13:56 gpu.sh
	drwxr-x---+ 3 user hpc 4096 Oct 20 18:09 nbody
	-rw-r-----+ 1 user hpc  611 Oct 22 13:41 slurm-1176.out


	[user@cirrus01 ~]$ cat slurm-1176.out
    ...
    > Windowed mode
	> Simulation data stored in video memory
	> Single precision floating point simulation
	> 1 Devices used for simulation
	GPU Device 0: "Turing" with compute capability 7.5
	
	> Compute 7.5 CUDA device: [Tesla T4]
	number of bodies = 2560000
	2560000 bodies, total time for 10 iterations: 308586.156 ms
	= 212.375 billion interactions per second
	= 4247.501 single-precision GFLOP/s at 20 flops per interaction

# Use QOS to run GPU jobs

* This page it's dedicate to users who want to run GPU's and have a QOS.

### GPU JOB submission using QOS

* In this example we will use the atributed QOS=gpu097822021 to be used for GPU and submit a job for the V100 Nvidia. 

```
#!/bin/bash
#SBATCH --job-name=prod01
#SBATCH --partition=gpu
#SBATCH --qos=gpu097822021
#SBATCH --gres=gpu:v100s
#SBATCH --output=%x.o%j
#SBATCH --error=%x.o%j

### Prepare the environment
module purge
module load gcc83/openmpi/4.1.1 cuda-11.2

echo hostname
```

# Deep Learning Example

The INCD-Lisbon facility provide a few GPU, check the ***[Comput Node Specs](https://wiki.incd.pt/books/compute-node-specs-and-information/page/list-of-servers)*** page.

#### Login on the submit node

Login on the cluster submition node, check the ***[How to Access](https://wiki.incd.pt/books/how-to-access)*** page for more information:

	
    $ ssh -l <username> cirrus8.a.incd.pt
    [username@cirrus01 ~]$ _

#### Alternatives to run the Deep Learning example

We have alternatives to run the *Deep Learning* example, or any other python based script:

 1. prepare a user python virtual environment on home directory and launch a batch job;
<!-- 2. we can also use the python virtual environment already prepared on the system and run the same example on it;
 3. finally we can use a **udocker** container also available on the system.
-->
The next three sections shows how to run the example for each method.


#### 1) Run a Deep Learning job using a prepared CVMFS python virtual environment

Instead of preparing an user python virtual environment we can use the environment already available on the system, named **python/3.10.13**, check it with the command

	[username@cirrus08 ~]$ module avail
    ---------------- /cvmfs/sw.el8/modules/hpc/main ------------------
    ...
    intel/oneapi/2023    python/3.8          udocker/alphafold/2.3.2
    julia/1.6.7          python/3.10.13 (D)
    ...

>We will find other **python** version, namely version **3.7** and **3.8**, this version do not contain the **tensorflo** module due to **python** version incompatibility.

We will change the submit script **dl.sh** to the following:

    [username@cirrus08 dl]$ vi dl.sh
    #!/bin/bash
	#SBATCH -p gpu
    #SBATCH --gres=gpu
    #SBATCH --mem=64G
   	
    module load python/3.10.7
    python run.py
    
    [username@cirrus08 dl]$ ls -l
    -rwxr-----+ 1 username usergroup   124 Feb 26 16:44 dl.sh
	-rw-r-----+ 1 username usergroup  1417 Feb 26 16:46 run.py
##### Submit the Job

	[username@cirrus08 dl]$ sbatch dl.sh
    Submitted batch job 15135448
	JOBID    PARTITION NAME       USER        ST TIME       NODES CPUS TRES_PER_NODE  NODELIST
    15290034 gpu       dl.sh      jpina       PD 0:00       1     1    gres/gpu                  

##### Check Job results

On completion check results on standard output and error files:

	[username@cirrus08 dl]$ ls -l
    -rwxr-----+ 1 username usergroup   124 Feb 26 16:44 dl.sh
	-rw-r-----+ 1 username usergroup  1417 Feb 26 16:46 run.py
    -rw-r-----+ 1 username usergroup 18000 Feb 26 18:51 slurm-15135448.out

and procceed as in the previous example.

<!--
#### 2) Run a Deep Learning job using a user python virtual environment (

##### Prepare a python virtual environment

We will create a **python** virtual environment and include needed components, users do not have permission to install modules on the operating system.

	[username@cirrus08 ~]$ python3 -m venv ~/pvenv
    [username@cirrus08 ~]$ . ~/pvenv/bin/activate
    [username@cirrus08 ~]$ pip3 install --upgrade pip
	[username@cirrus08 ~]$ pip3 install --upgrade setuptools
    [username@cirrus08 ~]$ pip3 install tensorflow
    [username@cirrus08 ~]$ pip3 install keras

This opperation is performed only once, the **python** virtual environment will be reused all over your *jobs*.

##### Check the python virtual environment

You may check if the **python** virtual environment is working as expected, for example:

    [username@cirrus08 ~]$ . ~/pvenv/bin/activate
    [username@cirrus08 ~]$ python --version
    Python 3.6.8
    [username@cirrus08 ~]$ pip3 list
    Package              Version   
	-------------------- ----------
	...
    Keras                   2.6.0
    Keras-Preprocessing     1.1.2
    ...
    setuptools              59.6.0
    ...
    tensorboard             2.6.0
    tensorboard-data-server 0.6.1
    tensorboard-plugin-wit  1.8.1
    tensorflow              2.6.2
	tensorflow-estimator    2.6.0

##### Prepare your code

Choose a working directory for your code, for the purpose of this example we will run a deep learning python script named **[run.py](https://wiki.incd.pt/attachments/79)**, create also a submit script:

	[username@cirrus08 ~]$ mkdir dl
    [username@cirrus08 ~]$ cd dl
    [username@cirrus08 dl]$ cp /cvmfs/sw.el8/share/deep_learning/run.py .
    
    [username@cirrus08 dl]$ vi dl.sh
    #!/bin/bash
	#SBATCH -p gpu
    #SBATCH --gres=gpu
    #SBATCH --mem=64G
    . ~/pvenv/bin/activate
    module load cuda-10.2
    python run.py
    
    [username@cirrus08 dl]$ ls -l
    -rwxr-----+ 1 username usergroup  124 Feb 26 16:44 dl.sh
	-rw-r-----+ 1 username usergroup 1417 Feb 26 16:46 run.py

##### Submit the Job

	[username@cirrus08 dl]$ qbatch dl.sh
    Submitted batch job 15135448
	
    [username@cirrus08 dl]$ $ squeue 
	   JOBID PARTITION     NAME     USER ST  TIME  NODES NODELIST(REASON) 
	15135448       gpu    dl.sh username  R  0:01      1 hpc062 

##### Check Job results

On completion check results on standard output and error files:

	[username@cirrus08 dl]$ ls -l
    -rwxr-----+ 1 username usergroup   124 Feb 26 16:44 dl.sh
	-rw-r-----+ 1 username usergroup  1417 Feb 26 16:46 run.py
    -rw-r-----+ 1 username usergroup 18000 Feb 26 18:51 slurm-15135448.out
-->

# How to selected a GPU

### Select any GPU
* On this example we choose one GPU with at least 8192  MB memory.

```
#!/bin/bash

#SBATCH --partition=gpu
#SBATCH --gres=gpu
#SBATCH --mem=8192MB

COMMON=/usr/local/cuda/samples/common
SAMPLE=/usr/local/cuda/samples/5_Simulations/nbody

[ -d ../common ] || cp -r $COMMON ..
[ -d nbody     ] || cp -r $SAMPLE .

module load cuda
cd nbody
make clean
make

if [ -e nbody ]; then
	chmod u+x nbody
	./nbody -benchmark -numbodies=2560000
fi
```
### Select a specific GPU: V100s


```
#!/bin/bash

#SBATCH --partition=gpu
#SBATCH --gres=gpu:v100s

COMMON=/usr/local/cuda/samples/common
SAMPLE=/usr/local/cuda/samples/5_Simulations/nbody

[ -d ../common ] || cp -r $COMMON ..
[ -d nbody     ] || cp -r $SAMPLE .

module load cuda
cd nbody
make clean
make

if [ -e nbody ]; then
	chmod u+x nbody
	./nbody -benchmark -numbodies=2560000
fi
```
### GPU list 

You can find the full GPU list per cluster [here](https://wiki.incd.pt/books/compute-node-specs-and-information)