AlphaFold
1. Introduction
The INCD team prepared a local installation of AlphaPhold using a container based on UDOCKER (instead of DOCKER) and includes the Genetic Database.
The local installation provide the AlphaFold version 2.1.1 over a container based on Ubuntu 18.04 distribution with cuda-11.0 and cudnn-8.
The main resource target of AlphaFold is the GPU but the application also execute only on the CPU although the performance is substantially worst, see the Run Exemples section bellow.
1.1 Environment
The environment is activate with command
$ module load udocker/alphaphold/2.1.1
this will activate automatically a virtual environment ready to start the AlphaFold container throught the python script run_udocker.py.
1.2 Data Base Location
The Genetic Database is installed bellow the filesystem directory
/users3/data/alphafold
on read-only mode, upgrades may be requested using the helpdesk@incd.pt address.
1.3 run_udocker.py Script
The run_udocker.py script was adapted from the run_docker.py script normally used by AlphaFold with the docker container technology.
The run_udocker.py accept the same options as the run_docker.py script with a few minor changes that we hope it will facilitate user interaction. The user may change the script behavour throught environment variables or command line options, we can see only the changes bellow:
Optional environment variables:
Variable Name | Default Value | Comment |
---|---|---|
DOWNLOAD_DIR | none | Genetic database location |
OUPTPUT_DIR | none | Output results directory |
Command line options:
Command Option | Mandatory | Default Value | Comment |
---|---|---|---|
--data_dir | no | /users3/data/alphafold | Genetic database location, takes precedence over DOWNLOAD_DIR when both are selected |
--output_dir | no | <working_dir>/output | Output results directory, takes precedence over OUTPUT_DIR when both are selected |
The option --data_dir is required on the standard AlphaFold run_docker.py script, we choose to select automatically the location of the genetic database but the user may change this path throught the environment variable DOWNLOAD_DIR or the command line option data_dir.
The AlphaFold standard output results directory location is /tmp/alphafold by default, please note that we change this location to the local working directory, the user can select a different path throught the environment variable OUTPUT_DIR or the command line option --output_dir.
2. How to Run
We only need a protein and a submition script, if we analyze multiple proteins on parallel it is advise to submit then from different directory in order to avoid interference between runs.
2.1 Example on Partition "gpu"
Lets analyze the https://www.uniprot.org/uniprot/P19113 protein, for example.
Create a working directory and get the protein:
[user@cirrus ~]$ mkdir run_P19113
[user@cirrus ~]$ cd run_P19113
[user@cirrus run_P19133]$ wget -q https://www.uniprot.org/uniprot/P19113.fasta
Use your favority editor the create the submition script submit.sh*:
[user@cirrus run_P19133]$ emacs submit.sh
#!/bin/bash
# -------------------------------------------------------------------------------
#SBATCH --job-name=P19113
#SBATCH --partition=gpu
#SBATCH --mem=30G
#SBATCH --ntasks=4
#SBATCH --gres=gpu
# -------------------------------------------------------------------------------
module purge
module load udocker/alphafold/2.1.1
run_udocker.py --fasta_paths=P19113.fasta --max_template_date=2020-05-14
Finally, submit your job, check if it is running and wait for it:
[user@cirrus run_P19133]$ sbatch submit.sh
[user@cirrus run_P19133]$ squeue
When finish the local directory ./output will have the analyze results.