# AlphaFold

1. [Introduction](#introduction)
   1. [Environment](#environment)
   2. [Data Base Location](#location)
   3. [run_udocker.py](#run_udocker.py)
2. [How to Run](#run)
   1. [Example on Partition "gpu"](#p19113-gpu)
   2. [Example on Partition "fct"](#p19113-fct)
   3. [Example on Partition "hpc"](#p19113-hpc)
   4. [sbatch Options](#options)
3. [Benchmarks](#benchmarks)
4. [References](#references)


### 1. Introduction <a name="introduction"></a>

The **INCD** team prepared a local installation of AlphaPhold using a container based on **[UDOCKER](https://github.com/indigo-dc/udocker)** (instead of **[DOCKER](https://www.docker.com/)**) and includes the *Genetic Database*.

The local installation provide the **AlphaFold** version **2.1.1** over a container based on **Ubuntu 18.04** distribution with **cuda-11.0** and **cudnn-8**.

The main resource target of **AlphaFold** is the **GPU** but the application also execute only on the **CPU** although the performance is substantially worst, see the [Benchmarks](#benchmarks) section bellow.

#### 1.1 Environment <a name="environment"></a>

The environment is activate with command

	$ module load udocker/alphaphold/2.1.1

this will activate automatically a virtual environment ready to start the **AlphaFold** container throught the python script **run_udocker.py**.

#### 1.2 Data Base Location <a name="location"></a>

The **Genetic Database** is installed bellow the filesystem directory

	/users3/data/alphafold

on read-only mode, upgrades may be requested using the *helpdesk@incd.pt* address.

#### 1.3 run_udocker.py Script <a name="run_udocker.py"></a>

The **run_udocker.py** script was adapted from the **run_docker.py** script normally used by **AlphaFold** with the **docker** container technology.

The **run_udocker.py** accept the same options as the **run_docker.py** script with a few minor changes that we hope it will facilitate user interaction. The user may change the script behavour throught environment variables or command line options, we can see only the changes bellow:

Optional environment variables:
| Variable Name | Default Value | Comment |
|---            |---            |---      |
| DOWNLOAD_DIR  | none          | Genetic database location (absolute path) |
| OUPTPUT_DIR   | none          | Output results directory  (absolute path) |

Command line options:
| Command Option | Mandatory | Default Value          | Comment                   |
|---             |---        |---                     |---                        |
| --data_dir     | no        | **/local/alphafold** or<br>**/users3/data/alphafold** | Genetic database location, takes precedence over DOWNLOAD_DIR when both are selected |
| --output_dir   | no        | <working_dir>/output   | Absolute path to the results directory, takes precedence over OUTPUT_DIR when both are selected                   |

> The option **--data_dir** is required on the standard AlphaFold **run_docker.py** script, we choose to select automatically the location of the **genetic database** but the user may change this path throught the environment variable **DOWNLOAD_DIR** or the command line option **--data_dir**. When possible, we provide a local copy to the workernodes of the database directory in order to improve job performance.

> The AlphaFold standard output results directory location is **/tmp/alphafold** by default, please note that we change this location to the local working directory, the user can select a different path throught the environment variable **OUTPUT_DIR** or the command line option **--output_dir**.

### 2. How to Run <a name="run"></a>

We only need a protein and a submition script, if we analyze multiple proteins on parallel it is advise to submit then from different directory in order to avoid interference between runs.

#### 2.1 Example on Partition "gpu" <a name="p19113-gpu"></a>

Lets analyze the [https://www.uniprot.org/uniprot/P19113](P19113) protein, for example.

Create a working directory and get the protein:

	[user@cirrus ~]$ mkdir run_P19113
    [user@cirrus ~]$ cd run_P19113
    [user@cirrus run_P19113]$ wget -q https://www.uniprot.org/uniprot/P19113.fasta


Use your favority editor the create the submition script **submit.sh***:

    [user@cirrus run_P19113]$ emacs submit.sh
    #!/bin/bash
	# -------------------------------------------------------------------------------
    #SBATCH --job-name=P19113
	#SBATCH --partition=gpu
	#SBATCH --mem=50G
	#SBATCH --ntasks=4
	#SBATCH --gres=gpu
    # -------------------------------------------------------------------------------
	module purge
    module load udocker/alphafold/2.1.1
    run_udocker.py --fasta_paths=P19113.fasta --max_template_date=2020-05-14

Finally, submit your job, check if it is running and wait for it:

    [user@cirrus run_P19113]$ sbatch submit.sh
    [user@cirrus run_P19113]$ squeue

When finish the local directory **./output** will have the analyze results.

#### 2.2 Example on Partition "fct" <a name="p19113-fct"></a>

    [user@cirrus run_P19113]$ emacs submit.sh
    #!/bin/bash
	# -------------------------------------------------------------------------------
    #SBATCH --job-name=P19113
	#SBATCH --partition=fct
    #SBATCH --qos=<qos>
    #SBATCH --account=<account>		# optional on most cases
	#SBATCH --mem=50G
	#SBATCH --ntasks=4
	#SBATCH --gres=gpu
    # -------------------------------------------------------------------------------
	module purge
    module load udocker/alphafold/2.1.1
    run_udocker.py --fasta_paths=P19113.fasta --max_template_date=2020-05-14


#### 2.3 Example on Partition "hpc" <a name="p19113-hpc"></a>

    [user@cirrus run_P19113]$ emacs submit.sh
    #!/bin/bash
	# -------------------------------------------------------------------------------
    #SBATCH --job-name=P19113
	#SBATCH --partition=hpc
	#SBATCH --mem=50G
	#SBATCH --ntasks=4
    # -------------------------------------------------------------------------------
	module purge
    module load udocker/alphafold/2.1.1
    run_udocker.py --fasta_paths=P19113.fasta --max_template_date=2020-05-14


#### 2.4 sbatch Options <a name="options"></a>

##### --partition=XX

The best job performance is achivied on the **gpu** or **fct** partitions, the later is restricted to users with a valid **QOS**.

The **alphafold** and also run on the **hpc** partition but is this case it will use only a slower **CPU** and there is no **GPU** available, the total run time is roughly eight times greather when compared to jobs executed on the **gpu** or **fct** partitions.

##### --mem=50G

The default job memory allocation per cpu depends on the used partition but it may be insuficient, we recommend you to request **50GB** of memory, the benchmarks sugest this value should be enough on all cases.

##### --ntasks=4

Apparentelly this is the maximum number of tasks needed by the application, we didn't get any noticible improvement when rising this parameter.

##### --gres=gpu

The partitions **gpu** and **fct** provide up to eight **GPUs**. The application was built for compute using **GPU**, there is no point is requesting more than one **GPU**, we didn't notice any improvement on the total run time. We also notice that the total compute time for both types of available **GPUs** is similar.

The **alphafold** also run only on **CPU** but the total run time increase substantial, as seen on benchmarks results bellow.

### 4. Benchmarks <a name="benchmaks"></a>

We made some benchmarks with the protein *P19113* in order to help users organizing their work.

The results bellow sugest that the best choice would be use four **CPU** tasks, one **GPU** and let the system select the local copy of the *genetic data base* on the workernodes.

Since a **GPU** run takes roughly two hours and half then users may run up to thirty five protein analyzes in one submit job, as long they are executed in sequence.

  <table caption="P19113">
  <tr bgcolor="dddd">
    <td>Partition</td>
    <td>CPU</td>
    <td>#CPU</td>
    <td>GPU</td>
    <td>#GPU</td>
    <td>#JOBS</td>
    <td>DOWNLOAD_DIR</td>
    <td>ELAPSED_TIME</td>
  </tr>
  <tr bgcolor="lightyellow">
    <td>gpu/fct</td>
    <td>EPYC_7552</td>
    <td>4</td>
    <td>Tesla_T4</td>
    <td>1</td>
    <td>1</td>
    <td>/local/alphafold</td>
    <td>02:22:19</td>
  </tr>
  <tr bgcolor="lightyellow">
    <td>gpu/fct</td>
    <td>EPYC_7552</td>
    <td>4</td>
    <td>Tesla_V100S</td>
    <td>1</td>
    <td>1</td>
    <td>/local/alphafold</td>
    <td>02:38:21</td>
  </tr>
  <tr bgcolor="#eee">
    <td>gpu/fct</td>
    <td>EPYC_7552</td>
    <td>4</td>
    <td>Tesla_T4</td>
    <td>2</td>
    <td>1</td>
    <td>/local/alphafold</td>
    <td>02:22:25</td>
  </tr>
  <tr bgcolor="#eee">
    <td>gpu/fct</td>
    <td>EPYC_7552</td>
    <td>4</td>
    <td>Tesla_T4</td>
    <td>1</td>
    <td>1</td>
    <td>/users3/data/alphafold</td>
    <td>15:59:50</td>
  </tr>
  <tr bgcolor="#eee">
    <td>gpu/fct</td>
    <td>EPYC_7552</td>
    <td>4</td>
    <td>Tesla_V100S</td>
    <td>1</td>
    <td>1</td>
    <td>/users3/data/alphafold</td>
    <td>11:40:04</td>
  </tr>
  <tr bgcolor="#eee">
    <td>gpu/fct</td>
    <td>EPYC_7552</td>
    <td>4</td>
    <td>Tesla_T4</td>
    <td>2</td>
    <td>1</td>
    <td>/users3/data/alphafold</td>
    <td>14:58:52</td>
  </tr>
  <tr bgcolor="#ddd">
    <td>gpu/fct</td>
    <td>EPYC_7552</td>
    <td>4</td>
    <td></td>
    <td>0</td>
    <td>1</td>
    <td>/local/alphafold</td>
    <td>16:17:32</td>
  </tr>
  <tr bgcolor="#ddd">
    <td>gpu/fct</td>
    <td>EPYC_7552</td>
    <td>4</td>
    <td></td>
    <td>0</td>
    <td>1</td>
    <td>/users3/data/alphafold</td>
    <td>18:22:07</td>
  </tr>
  <tr bgcolor="#ddd">
    <td>gpu/fct</td>
    <td>EPYC_7552</td>
    <td>4</td>
    <td></td>
    <td>0</td>
    <td>4</td>
    <td>/local/alphafold</td>
    <td>17:53:25</td>
  </tr>
  <tr bgcolor="#bbb">
    <td>hpc</td>
    <td>EPYC_7501</td>
    <td>4</td>
    <td></td>
    <td>0</td>
    <td>3</td>
    <td>/local/alphafold</td>
    <td>21:44:35</td>
  </tr>
  <tr bgcolor="#bbb">
    <td>hpc</td>
    <td>EPYC_7501</td>
    <td>32</td>
    <td></td>
    <td>0</td>
    <td>1</td>
    <td>/local/alphafold</td>
    <td>16:35:59</td>
  </tr>
  <tr bgcolor="#bbb">
    <td>hpc</td>
    <td>EPYC_7501</td>
    <td>4</td>
    <td></td>
    <td>0</td>
    <td>1</td>
    <td>/users3/data/alphafold</td>
    <td>1-02:28:33</td>
  </tr>
  <tr bgcolor="#bbb">
    <td>hpc</td>
    <td>EPYC_7501</td>
    <td>16</td>
    <td></td>
    <td>0</td>
    <td>1</td>
    <td>/users3/data/alphafold</td>
    <td>1-03:42:23</td>
  </tr>
  <tr bgcolor="#bbb">
    <td>hpc</td>
    <td>EPYC_7501</td>
    <td>32</td>
    <td></td>
    <td>0</td>
    <td>1</td>
    <td>/users3/data/alphafold</td>
    <td>1-03:15:19</td>
  </tr>
</table>
  
  
  
### 5. References <a name="references"></a>

* [https://github.com/deepmind/alphafold](https://github.com/deepmind/alphafold)
* [https://github.com/indigo-dc/udocker](https://github.com/indigo-dc/udocker)
* [https://www.docker.com](https://www.docker.com)
* [https://www.uniprot.org/uniprot](https://www.uniprot.org/uniprot)
* [https://www.uniprot.org/uniprot/P19113](P19113)