Skip to main content

AlphaFold

  1. Introduction
    1. Environment
    2. Data Base Location
    3. run_udocker.py
  2. How to Run
    1. Example on Partition "gpu"
    2. Example on Partition "fct"
    3. Example on Partition "hpc"
    4. sbatch Options
  3. Benchmarks
  4. References

1. Introduction

The INCD team prepared a local installation of AlphaPhold using a container based on UDOCKER (instead of DOCKER) and includes the Genetic Database.

The local installation provide the AlphaFold version 2.1.1 over a container based on Ubuntu 18.04 distribution with cuda-11.0 and cudnn-8.

The main resource target of AlphaFold is the GPU but the application also execute only on the CPU although the performance is substantially worst, see the Benchmarks section bellow.

1.1 Environment

The environment is activate with command

$ module load udocker/alphaphold/2.1.1

this will activate automatically a virtual environment ready to start the AlphaFold container throught the python script run_udocker.py.

1.2 Data Base Location

The Genetic Database is installed bellow the filesystem directory

/users3/data/alphafold

on read-only mode, upgrades may be requested using the helpdesk@incd.pt address.

1.3 run_udocker.py Script

The run_udocker.py script was adapted from the run_docker.py script normally used by AlphaFold with the docker container technology.

The run_udocker.py accept the same options as the run_docker.py script with a few minor changes that we hope it will facilitate user interaction. The user may change the script behavour throught environment variables or command line options, we can see only the changes bellow:

Optional environment variables:

Variable Name Default Value Comment
DOWNLOAD_DIR none Genetic database location
OUPTPUT_DIR none Output results directory

Command line options:

Command Option Mandatory Default Value Comment
--data_dir no /users3/data/alphafold Genetic database location, takes precedence over DOWNLOAD_DIR when both are selected
--output_dir no <working_dir>/output Output results directory, takes precedence over OUTPUT_DIR when both are selected

The option --data_dir is required on the standard AlphaFold run_docker.py script, we choose to select automatically the location of the genetic database but the user may change this path throught the environment variable DOWNLOAD_DIR or the command line option data_dir.

The AlphaFold standard output results directory location is /tmp/alphafold by default, please note that we change this location to the local working directory, the user can select a different path throught the environment variable OUTPUT_DIR or the command line option --output_dir.

2. How to Run

We only need a protein and a submition script, if we analyze multiple proteins on parallel it is advise to submit then from different directory in order to avoid interference between runs.

2.1 Example on Partition "gpu"

Lets analyze the https://www.uniprot.org/uniprot/P19113 protein, for example.

Create a working directory and get the protein:

[user@cirrus ~]$ mkdir run_P19113
[user@cirrus ~]$ cd run_P19113
[user@cirrus run_P19133]$ wget -q https://www.uniprot.org/uniprot/P19113.fasta

Use your favority editor the create the submition script submit.sh*:

[user@cirrus run_P19133]$ emacs submit.sh
#!/bin/bash
# -------------------------------------------------------------------------------
#SBATCH --job-name=P19113
#SBATCH --partition=gpu
#SBATCH --mem=30G
#SBATCH --ntasks=4
#SBATCH --gres=gpu
# -------------------------------------------------------------------------------
module purge
module load udocker/alphafold/2.1.1
run_udocker.py --fasta_paths=P19113.fasta --max_template_date=2020-05-14

Finally, submit your job, check if it is running and wait for it:

[user@cirrus run_P19133]$ sbatch submit.sh
[user@cirrus run_P19133]$ squeue

When finish the local directory ./output will have the analyze results.

2.2 Example on Partition "fct"

[user@cirrus run_P19133]$ emacs submit.sh
#!/bin/bash
# -------------------------------------------------------------------------------
#SBATCH --job-name=P19113
#SBATCH --partition=fct
#SBATCH --qos=<qos>
#SBATCH --account=<account>		# optional on most cases
#SBATCH --mem=30G
#SBATCH --ntasks=4
#SBATCH --gres=gpu
# -------------------------------------------------------------------------------
module purge
module load udocker/alphafold/2.1.1
run_udocker.py --fasta_paths=P19113.fasta --max_template_date=2020-05-14

2.3 Example on Partition "hpc"

[user@cirrus run_P19133]$ emacs submit.sh
#!/bin/bash
# -------------------------------------------------------------------------------
#SBATCH --job-name=P19113
#SBATCH --partition=hpc
#SBATCH --mem=30G
#SBATCH --ntasks=4
# -------------------------------------------------------------------------------
module purge
module load udocker/alphafold/2.1.1
run_udocker.py --fasta_paths=P19113.fasta --max_template_date=2020-05-14

2.4 sbatch Options

--mem=30G50G
--ntasks=4
--gres=gpu

4. Benchmarks

Job_IdPartition #CPUCPU #GPU MaxRSSGPU#GPU#JOBSDOWNLOAD_DIR Elapsed Time
3271213fct 4xEPYC_7552EPYC_7552 1xTesla_T44 25GBTesla_T4 08:42:40
32721531 fct2xEPYC_75521xTesla_T425GB16:39:36
3272157fct2xEPYC_75522xTesla_T4
3271216fct 4xEPYC_7552EPYC_7552 ----------2 25GBTesla_T4 21:38:461
3267293fct 16xEPYC_7552EPYC_7552 ----------2 34GBTesla_T4 15:10:412
3272165fct 32xEPYC_7552EPYC_75524 ----------
3284940fct hpcEPYC_7552 16xEPYC_750116 ----------
3284943fct hpcEPYC_7552 32xEPYC_750132 ----------
hpcEPYC_750116----------
hpcEPYC_750132----------

5. References