AlphaFold 3¶

Introduction¶

AlphaFold 3 (AF3) is an updated version of the Alphafold software. Full documentation and the link to their paper can be found in their GitHub page.

Prerequisites¶

Before running an AF3 job you would need to download the AF3 model parameters and to ensure that you can use Apptainer/Singularity on Euler.

AF3 model parameters can be requested using this form. Unfortunately, due to Terms and Services that are quite constraining for institutions, we cannot provide AF3 parameters centrally. However, the file takes up only ~1GB of storage. Please be mindful of the 15-day purge if you save the parameters in your personal scratch.

To ensure that you can run containers on the cluster, please run the get_access command in the terminal connected to Euler. Additional information is available here.

Test interactive job¶

First, you would need to prepare an input file. Full instruction are available in the AF3 documentation. For this test, the input file should be named input.json. Here, we used again Ubiquitin, that we also used as a test for the AF2 version:

{
  "name": "Job name goes here",
  "modelSeeds": [1, 2],
  "sequences": [
    {
      "protein": {
        "id": "A",
        "sequence": "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQRESTLHLVLRLRGG"
      }
    }
  ],
  "dialect": "alphafold3",
  "version": 2
}

To run a first AF3 job and to familiarise yourself with the container on Euler, you can ask for an interactive job on the cluster. Here is an example of such a job for a suggestion of resources to request to fold a small (~100 AA) protein:

srun -n 8 -G 1 --time=04:00:00 --mem-per-cpu=15g --pty bash

Once you get a prompt on a compute node, you will need to run the AF3 container. You would need to replace the <…> parts with something that makes sense for you:

singularity exec --nv \
  --bind <...path on euler to the folder containing the AF3 input json file...>:/root/af_input \
  --bind <...path on euler to where the container should write the outputs...>:/root/af_output \
  --bind <...path on euler to pre-downloaded AF3 weights...>:/root/models \
  --bind /cluster/project/alphafold/alphafold3:/root/public_databases \
  /cluster/apps/nss/alphafold/containers/AlphaFold3/af3.sif python3 \
  /app/alphafold/alphafold-3.0.1/run_alphafold.py \
  --json_path=/root/af_input/input.json \
  --model_dir=/root/models \
  --db_dir=/root/public_databases \
  --output_dir=/root/af_output

If you work with a GPU that has a compute capability lower than 8.0 (i.e., any GPU model outside of an A100 and RTX 4090 on the cluster), you will get an explicit error from AF3 requesting additional XLA options. If you have not requested your GPU model explicitly, you can see it by running nvidia-smi command in the prompt of the interactive job. Full list of GPU models on Euler is available on the bottom of this page. This is the full command that will accommodate the GPUs with lower compute capability:

singularity exec --nv \
  --bind <...path on euler to the folder containing the AF3 input json file...>:/root/af_input \
  --bind <...path on euler to where the container should write the outputs...>:/root/af_output \
  --bind <...path on euler to pre-downloaded AF3 weights...>:/root/models \
  --bind /cluster/project/alphafold/alphafold3:/root/public_databases \
  --env=XLA_FLAGS="--xla_disable_hlo_passes=custom-kernel-fusion-rewriter" \
  /cluster/apps/nss/alphafold/containers/AlphaFold3/af3.sif python3 \
  /app/alphafold/alphafold-3.0.1/run_alphafold.py \
  --json_path=/root/af_input/input.json \
  --model_dir=/root/models \
  --db_dir=/root/public_databases \
  --output_dir=/root/af_output \
  --flash_attention_implementation=xla

Submit a batch job¶

Once you have created an input file, either as suggested above or per any other instruction of AF3 documentation, please create a text file, e.g. AF3_job.sbatch, containing:

#!/usr/bin/bash
#SBATCH -n 8
#SBATCH --time=4-00:00:00
#SBATCH --mem-per-cpu=17g
#SBATCH --nodes=1
#SBATCH --G 1
#SBATCH -A <my_share>
#SBATCH -J af3
#SBATCH -e ./%j.err
#SBATCH -o ./%j.out

singularity exec --nv \
  --bind <...path on euler to the folder containing the AF3 input json file...>:/root/af_input \
  --bind <...path on euler to where the container should write the outputs...>:/root/af_output \
  --bind <...path on euler to pre-downloaded AF3 weights...>:/root/models \
  --bind /cluster/project/alphafold/alphafold3:/root/public_databases \
  /cluster/apps/nss/alphafold/containers/AlphaFold3/af3.sif python3 \
  /app/alphafold/alphafold-3.0.1/run_alphafold.py \
  --json_path=/root/af_input/input.json \
  --model_dir=/root/models \
  --db_dir=/root/public_databases \
  --output_dir=/root/af_output

Please make sure to replace all of the information in between <...> and adjust the singularity command based on the GPU model that you are aiming for (see above). Then, submit the job as:

sbatch AF3_job.sbatch

Known issues¶

In the AF3 outputs, you may see:

[...] xla_bridge.py:895] Unable to initialize backend 'rocm': module 'jaxlib.xla_extension' has no attribute 'GpuAllocatorConfig'
[...] xla_bridge.py:895] Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory

Those warnings are expected. AF3 runs check whether a TPU is available. TPUs are a proprietary chip of Google that are not commercialised, however, one can use it on google cloud or google colab. AF3 also run a detection of rocm or cuda, therefore you will get a warning that rocm has not been find if you are on an NVIDA GPU and that cuda has not been found if you are on an AMD GPU.

Databases¶

The AlphaFold databases are available at

/cluster/project/alphafold

AlphaFold 2 uses HHsearch and HHblits from the HH-suite to perform protein sequence searching. The HH-suite performs many random file accesses and read operations. We have benchmarked different storage systems and found /cluster/project to significantly outperform the others. This is because it uses SSD storage.

Storage system benchmarking¶

We tested the performance of AlphaFold to fold two proteins (Ubiquitin with the length of 76 amino acids, T1050 with the length of 779 amino acids) reading the AlphaFold databases from our three central storage systems.

/cluster/scratch is a fast, short-term, personal storage system based on SSD
/cluster/project is a long-term group storage system which uses HDD for the permanent storage and NVMe flash caches to accelerate the reading speed
/cluster/work is a fast, long-term, group storage system based on HDD and suitable for large files

The tests ran on RTX 2080 Ti and TITAN RTX. All jobs used 12 CPU cores, 1 GPU, 120 GB of RAM and 120 GB of scratch space. The figures below show the benchmark results which are the average runtime of five runs for the tests with the databases on /cluster/scratch and /cluster/project. The tests with the databases on /cluster/work were run only once because the high load affected other users. The test was done in non-exclusive mode.

The performance results of AlphaFold2 in folding the Ubiquitin structure.

A cartoon representation of two superimposed ubiquitin structures. Ubiquitin is a small monomeric protein with 76 amino acids. The structure in blue has been determined experimentally (X-ray crystallography, pdb database code: 1upq.pdb). The model in green shows the structure predicted by AlphaFold2. The RMSD (root mean square distance) between the two structures is 0.797 A. The RMSD has been calculated for the backbone atoms. (Image and caption text by Dr. Simon Rüdisser, BNSP)

The performance results of AlphaFold2 in folding the T1050 structure.

The five models of T1050 generated by AlphaFold2 are shown as cartoon representation. T1050 is a monomeric protein with 779 amino acids. T1050 is one of the targets from the CASP (Critical Assessment of Techniques for Protein Structure Prediction) initiative. (Image and caption text by Dr. Simon Rüdisser, BNSP)

Conclusion: From testing folding the two proteins with AlphaFold, /cluster/project shows to be the best choice as a group storage for the AlphaFold databases. The performance of AlphaFold when reading the data from /cluster/scratch and /cluster/project is comparable to one another and around 10 times faster than when reading the data from /cluster/work. /cluster/scratch is for short-term storage and only for personal use and, therefore, it is not an optimal solution for a group of users.