AlphaFold

Introduction
How to Run
1. ~~Simple~~ Example on Partition "gpu"
2. sbatch Options
~~Run Examples~~Benchmarks
References

1. Introduction

The INCD team prepared a local installation of AlphaPhold using a container based on UDOCKER (instead of DOCKER) and includes the Genetic Database.

The local installation provide the AlphaFold version 2.1.1 over a container based on Ubuntu 18.04 distribution with cuda-11.0 and cudnn-8.

The main resource target of AlphaFold is the GPU but the application also execute only on the CPU although the performance is substantially worst, see the Run Exemples section bellow.

1.1 Environment

The environment is activate with command

$ module load udocker/alphaphold/2.1.1

this will activate automatically a virtual environment ready to start the AlphaFold container throught the python script run_udocker.py.

1.2 Data Base Location

The Genetic Database is installed bellow the filesystem directory

/users3/data/alphafold

on read-only mode, upgrades may be requested using the helpdesk@incd.pt address.

1.3 run_udocker.py Script

The run_udocker.py script was adapted from the run_docker.py script normally used by AlphaFold with the docker container technology.

The run_udocker.py accept the same options as the run_docker.py script with a few minor changes that we hope it will facilitate user interaction. The user may change the script behavour throught environment variables or command line options, we can see only the changes bellow:

Optional environment variables:

Variable Name	Default Value	Comment
DOWNLOAD_DIR	none	Genetic database location
OUPTPUT_DIR	none	Output results directory

Command line options:

Command Option	Mandatory	Default Value	Comment
--data_dir	no	/users3/data/alphafold	Genetic database location, takes precedence over DOWNLOAD_DIR when both are selected
--output_dir	no	<working_dir>/output	Output results directory, takes precedence over OUTPUT_DIR when both are selected

The option --data_dir is required on the standard AlphaFold run_docker.py script, we choose to select automatically the location of the genetic database but the user may change this path throught the environment variable DOWNLOAD_DIR or the command line option data_dir.

The AlphaFold standard output results directory location is /tmp/alphafold by default, please note that we change this location to the local working directory, the user can select a different path throught the environment variable OUTPUT_DIR or the command line option --output_dir.

2. How to Run

We only need a protein and a submition script, if we analyze multiple proteins on parallel it is advise to submit then from different directory in order to avoid interference between runs.

2.1 Simple Example on Partition "gpu"

Lets analyze the https://www.uniprot.org/uniprot/P19113 protein, for example.

Create a working directory and get the protein:

[user@cirrus ~]$ mkdir run_P19113
[user@cirrus ~]$ cd run_P19113
[user@cirrus run_P19133]$ wget -q https://www.uniprot.org/uniprot/P19113.fasta

Use your favority editor the create the submition script submit.sh*:

[user@cirrus run_P19133]$ emacs submit.sh
#!/bin/bash
# -------------------------------------------------------------------------------
#SBATCH --job-name=P19113
#SBATCH --partition=gpu
#SBATCH --mem=30G
#SBATCH --ntasks=4
#SBATCH --gres=gpu
# -------------------------------------------------------------------------------
module purge
module load udocker/alphafold/2.1.1
run_udocker.py --fasta_paths=P19113.fasta --max_template_date=2020-05-14

Finally, submit your job, check if it is running and wait for it:

[user@cirrus run_P19133]$ sbatch submit.sh
[user@cirrus run_P19133]$ squeue

When finish the local directory ./output will have the analyze results.