Skip to content

AlphaFold

AlphaFold v2 and AlphaFold v3 are different in how they work. See respective section below for details.

AlphaFold3

These features are recent. Any issues or requests are welcome via the support form

Running on Alvis the CPU-only MSA part will have to be split into a separate job from the rest. For details see AlphaFold3 performance documentation.

Datasets

The datasets are available at /mimer/NOBACKUP/Datasets/AlphafoldDatasets/v3.0.1/, see our dataset documentation. Do note that you will need your own copy of the weights as the terms of use does not allow distributing these. See AlphaFold3 documentation

Module

At the time of writing there is no stable module, however, you can find a version in the test tree:

use_test_tree
module purge
module load AlphaFold3

Container

You can find an AlphaFold3 container at /apps/containers/AlphaFold/AlphaFold-3.0.1.sif.

Example

Following the official example we create fold_input.json and two jobscripts, one for the MSA part and one for the actual inference.

#!/usr/bin/env bash
#SBATCH -A C3SE-STAFF  # <-- replace with your project here
#SBATCH -C NOGPU -c 4
#SBATCH -t 360
#SBATCH -J MSA-2pv7

module purge

apptainer run /apps/containers/AlphaFold/AlphaFold-3.0.1.sif \
    --db_dir=/mimer/NOBACKUP/Datasets/AlphafoldDatasets/v3.0.1/ \
    --run_data_pipeline \
    --norun_inference \
    --output_dir "msa" \
    --json_path fold_input.json \
    "$@"
#!/usr/bin/env bash
#SBATCH -A C3SE-STAFF  # <-- replace with your project here
#SBATCH --gpus-per-node=A100:1
#SBATCH -t 120
#SBATCH -J fold-2pv7

module purge

# You need to supply your own model weights, see
# https://github.com/google-deepmind/alphafold3/tree/v3.0.1?tab=readme-ov-file#obtaining-model-parameters
AF3_MODEL_DIR="TODO/your/path/here/"

apptainer run /apps/containers/AlphaFold/AlphaFold-3.0.1.sif \
    --db_dir="/mimer/NOBACKUP/Datasets/AlphafoldDatasets/v3.0.1/" \
    --model_dir="${AF3_MODEL_DIR}" \
    --norun_data_pipeline \
    --run_inference \
    --output_dir="fold" \
    --json_path msa/2pv7/2pv7_data.json \
    "$@"

Then we can submit these like this:

$ sbatch jobscript-msa.sh
Submitted batch job 5000001
$ sbatch --dependency=aftercorr:5000001 jobscript-inference.sh
Submitted batch job 5000002

Cite

From https://github.com/google-deepmind/alphafold3/:

Any publication that discloses findings arising from using [the AlphaFold3] source code, the model parameters or outputs produced by those should cite the Accurate structure prediction of biomolecular interactions with AlphaFold 3 paper.

AlphaFold

We provide an adapted version of AlphaFold to better make use of the available compute resources on our systems. This means that the steps to run AlphaFold on Alvis may be different from other places. An example is available at https://github.com/c3se/alvis-intro/tree/main/examples/AlphaFold

AlphaFold inference pipeline, source https://doi.org/10.1038/s41586-021-03819-2

In short running the AlphaFold inference pipeline on Alvis requires three jobs:

  1. Dataset look-up (MSA) on a CPU-only node (-C NOGPU)
  2. Predictions on a GPU (can be parallelized to separate jobs)
  3. A short job ranking the model outputs and doing the final relaxation

Cite

Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper and, if applicable, the AlphaFold-Multimer paper.

Patch

The additional patch is adapted from work by Thomas Hoffmann at EMBL Heidelberg. The one used on C3SE systems can be found at /apps/c3se-easyconfigs/AlphaFold-<version>_C3SEpipeline.patch

The new steps are roughly:

  1. Stop pipeline after writing features.pkl in order to allow running this part on CPU nodes.
    1. Reject to run creation of features.pkl, if jax finds GPUs.
  2. Make pipeline resumable. Restore features.pkl or result*.pkl and model*.pdb
  3. New parameter only_model_pred allowing to only predict specified model/pred. Thus, enable running as job array or with workflow manager.
  4. Parallelization of relaxation, if ALPHAFOLD_RELAX_PARALLEL is set, it sets the number of processes used (experimental)

Example

We will need two jobscripts, one for the MSA part on a CPU node and one for the actual inference.

#!/usr/bin/env bash
#SBATCH -A C3SE-STAFF  # <-- add your project here
#SBATCH -C NOGPU -c 4
#SBATCH -t 360
#SBATCH -J MSA-8D6M

# Get the fasta file from which we will predict a shape
identifier="${IDENTIFIER:-8D6M}"
fasta_path="${identifier}.fasta"
if [ ! -f "$fasta_path" ]; then
    # It is **not** recommended to download in the job, this is an exception
    # to make the example easier to follow. Remember that alvis2 is the
    dedicated data transfer node.
    wget "https://www.rcsb.org/fasta/entry/${identifier}" -O "$fasta_path"
    # you can find this structure at
    # https://www.rcsb.org/structure/$identifier
    # e.g.
    # https://www.rcsb.org/structure/6OSN
    # This is where you will find  release date of the structure
fi

module purge
module load AlphaFold/2.3.2-foss-2023a-CUDA-12.1.1

export ALPHAFOLD_DATA_DIR="/mimer/NOBACKUP/Datasets/AlphafoldDatasets/2022_12"
export ALPHAFOLD_HHBLITS_N_CPU="${SLURM_CPUS_ON_NODE}"
output_dir="${SLURM_SUBMIT_DIR}"

# Will create a features.pkl file from the MSA and then stop
# https://www.c3se.chalmers.se/documentation/software/machine_learning/alphafold/#patch
alphafold \
    --fasta_paths="${fasta_path}" \
    --max_template_date=2022-11-01 \
    --output_dir="${output_dir}" \
    "$@"
#!/usr/bin/env bash
#SBATCH -A C3SE-STAFF  # <-- add your project here
#SBATCH --gpus-per-node=A100:1
#SBATCH -t 120
#SBATCH -J fold-8D6M

module purge
module load AlphaFold/2.3.2-foss-2023a-CUDA-12.1.1

identifier="${IDENTIFIER:-8D6M}"
fasta_path="${identifier}.fasta"
output_dir="${SLURM_SUBMIT_DIR}"
export ALPHAFOLD_DATA_DIR="/mimer/NOBACKUP/Datasets/AlphafoldDatasets/2022_12"

if [ ! -f "${output_dir}/${identifier}/features.pkl" ]; then
    echo Could not find "features.pkl", run MSA on CPU first. >> /dev/stderr
    exit 1
fi

# Preciction
alphafold \
    --fasta_paths="${fasta_path}" \
    --max_template_date=2022-11-01 \
    --output_dir="${output_dir}" \
    "$@"

# if you want to run the prediction with a job-array in parallel you could do:
# sbatch --array=1-5 ...
# alphafold ... --only_model_pred="{SLURM_ARRAY_TASK_ID}"

# Relaxation
alphafold \
    --fasta_paths="${fasta_path}" \
    --max_template_date=2022-11-01 \
    --output_dir="${output_dir}" \
    --models_to_relax=BEST \
    "$@"

# to run relaxation in a CPU-only job use --nouse_gpu_relax

Then we can submit these like this:

$ sbatch jobscript-msa.sh
Submitted batch job 5000001
$ sbatch --dependency=aftercorr:5000001 jobscript-inference.sh
Submitted batch job 5000002