# Deep Learning Example

The INCD-Lisbon facility provide a few GPU, check the ***[Comput Node Specs](https://wiki.incd.pt/books/compute-node-specs-and-information/page/list-of-servers)*** page.

#### Login on the submit node

Login on the cluster submition node, check the ***[How to Access](https://wiki.incd.pt/books/how-to-access)*** page for more information:

	
    $ ssh -l <username> cirrus8.a.incd.pt
    [username@cirrus01 ~]$ _

#### Alternatives to run the Deep Learning example

We have alternatives to run the *Deep Learning* example, or any other python based script:

 1. prepare a user python virtual environment on home directory and launch a batch job;
<!-- 2. we can also use the python virtual environment already prepared on the system and run the same example on it;
 3. finally we can use a **udocker** container also available on the system.
-->
The next three sections shows how to run the example for each method.


#### 1) Run a Deep Learning job using a prepared CVMFS python virtual environment

Instead of preparing an user python virtual environment we can use the environment already available on the system, named **python/3.10.13**, check it with the command

	[username@cirrus08 ~]$ module avail
    ---------------- /cvmfs/sw.el8/modules/hpc/main ------------------
    ...
    intel/oneapi/2023    python/3.8          udocker/alphafold/2.3.2
    julia/1.6.7          python/3.10.13 (D)
    ...

>We will find other **python** version, namely version **3.7** and **3.8**, this version do not contain the **tensorflo** module due to **python** version incompatibility.

We will change the submit script **dl.sh** to the following:

    [username@cirrus08 dl]$ vi dl.sh
    #!/bin/bash
	#SBATCH -p gpu
    #SBATCH --gres=gpu
    #SBATCH --mem=64G
   	
    module load python/3.10.7
    python run.py
    
    [username@cirrus08 dl]$ ls -l
    -rwxr-----+ 1 username usergroup   124 Feb 26 16:44 dl.sh
	-rw-r-----+ 1 username usergroup  1417 Feb 26 16:46 run.py
##### Submit the Job

	[username@cirrus08 dl]$ sbatch dl.sh
    Submitted batch job 15135448
	JOBID    PARTITION NAME       USER        ST TIME       NODES CPUS TRES_PER_NODE  NODELIST
    15290034 gpu       dl.sh      jpina       PD 0:00       1     1    gres/gpu                  

##### Check Job results

On completion check results on standard output and error files:

	[username@cirrus08 dl]$ ls -l
    -rwxr-----+ 1 username usergroup   124 Feb 26 16:44 dl.sh
	-rw-r-----+ 1 username usergroup  1417 Feb 26 16:46 run.py
    -rw-r-----+ 1 username usergroup 18000 Feb 26 18:51 slurm-15135448.out

and procceed as in the previous example.

<!--
#### 2) Run a Deep Learning job using a user python virtual environment (

##### Prepare a python virtual environment

We will create a **python** virtual environment and include needed components, users do not have permission to install modules on the operating system.

	[username@cirrus08 ~]$ python3 -m venv ~/pvenv
    [username@cirrus08 ~]$ . ~/pvenv/bin/activate
    [username@cirrus08 ~]$ pip3 install --upgrade pip
	[username@cirrus08 ~]$ pip3 install --upgrade setuptools
    [username@cirrus08 ~]$ pip3 install tensorflow
    [username@cirrus08 ~]$ pip3 install keras

This opperation is performed only once, the **python** virtual environment will be reused all over your *jobs*.

##### Check the python virtual environment

You may check if the **python** virtual environment is working as expected, for example:

    [username@cirrus08 ~]$ . ~/pvenv/bin/activate
    [username@cirrus08 ~]$ python --version
    Python 3.6.8
    [username@cirrus08 ~]$ pip3 list
    Package              Version   
	-------------------- ----------
	...
    Keras                   2.6.0
    Keras-Preprocessing     1.1.2
    ...
    setuptools              59.6.0
    ...
    tensorboard             2.6.0
    tensorboard-data-server 0.6.1
    tensorboard-plugin-wit  1.8.1
    tensorflow              2.6.2
	tensorflow-estimator    2.6.0

##### Prepare your code

Choose a working directory for your code, for the purpose of this example we will run a deep learning python script named **[run.py](https://wiki.incd.pt/attachments/79)**, create also a submit script:

	[username@cirrus08 ~]$ mkdir dl
    [username@cirrus08 ~]$ cd dl
    [username@cirrus08 dl]$ cp /cvmfs/sw.el8/share/deep_learning/run.py .
    
    [username@cirrus08 dl]$ vi dl.sh
    #!/bin/bash
	#SBATCH -p gpu
    #SBATCH --gres=gpu
    #SBATCH --mem=64G
    . ~/pvenv/bin/activate
    module load cuda-10.2
    python run.py
    
    [username@cirrus08 dl]$ ls -l
    -rwxr-----+ 1 username usergroup  124 Feb 26 16:44 dl.sh
	-rw-r-----+ 1 username usergroup 1417 Feb 26 16:46 run.py

##### Submit the Job

	[username@cirrus08 dl]$ qbatch dl.sh
    Submitted batch job 15135448
	
    [username@cirrus08 dl]$ $ squeue 
	   JOBID PARTITION     NAME     USER ST  TIME  NODES NODELIST(REASON) 
	15135448       gpu    dl.sh username  R  0:01      1 hpc062 

##### Check Job results

On completion check results on standard output and error files:

	[username@cirrus08 dl]$ ls -l
    -rwxr-----+ 1 username usergroup   124 Feb 26 16:44 dl.sh
	-rw-r-----+ 1 username usergroup  1417 Feb 26 16:46 run.py
    -rw-r-----+ 1 username usergroup 18000 Feb 26 18:51 slurm-15135448.out
-->