AMD Instinct MI300A evaluation

We are pleased to announce the availability of 6 nodes, each equipped with 4 AMD MI300A Accelerated Processing Units (APUs), for evaluation and testing. APUs integrate CPU and GPU cores on a single die, sharing high-speed HBM3 memory.

Anybody with an ETH account has access at least one AMD MI300A APU. The other limits are the same as on Euler.

GPUs for everyone

During evaluation, the APUs are available to all cluster users, not just shareholders. This is a unique opportunity to test and provide feedback on the latest AMD GPU technology.

128 GB HBM3 memory

Each MI300A APU has 128 GB of high-bandwidth, shared memory (HBM3), providing ample memory for large datasets and complex computations. Shared memory allows efficient data access across CPU and GPU cores, reducing data transfer overhead. Ideal for applications that interleave CPU and GPU workloads.

Memory bandwidth

We measured 3 TB/s memory bandwidth, 2x more than the Nvidia A100 GPUs on Euler. This makes the MI300A APUs ideal for memory-intensive workloads.

Double-precision (FP64) performance

We measured 42 TFLOPS of double-precision (FP64) performance, 3x more than the Nvidia A100 GPUs on Euler. This makes the MI300A APUs suitable for scientific computing tasks that require high numerical precision.

For detailed hardware specifications and benchmark results, see: GPU Nodes.

Beta testing timeline¶

26 May 2025: Closed beta testing begins (invited users only)
16 June 2025: Open beta testing starts (all cluster users)
28 November 2025: Beta testing phase ends (extended from 29 August 2025)

Access¶

The MI300A APUs are managed by a dedicated Slurm instance. To use it, set the following environment variable while using Slurm commands:

export SLURM_CONF=/cluster/adm/slurm-amdgpu/slurm/etc/slurm.conf

Job submission works as usual. Note: Memory requests must account for both CPU and GPU memory.

Interactive job that requests 1x MI300A GPU and its 24 CPU cores

srun --ntasks=1 --cpus-per-task=24 --time=4:00:00 --gpus=mi300a:1 --mem-per-cpu=4G --pty bash

Batch job that requests 1x MI300A GPU and its 24 CPU cores

#!/bin/bash
#SBATCH -n 24
#SBATCH --time=08:00:00
#SBATCH --gpus=mi300a:1
#SBATCH --mem-per-cpu=4G

module load stack/2024-06
my_gpu_program

Software¶

The MI300A APUs are not compatible with the Nvidia CUDA toolkit. Instead, they use AMD's ROCm framework. Your software must be compatible with ROCm or compiled with ROCm support.

ROCm 6.3.2 and 6.4.1 are currently available on Euler.

PyTorch¶

To install PyTorch, load Python 3.12 (see: Python) and follow the official PyTorch installation instructions.

Machine learning environments¶

A Python virtual environment with TensorFlow 2.17.0 and PyTorch 2.4.0 (both with ROCm support) is provided.

ROCm 6.3.2¶

module load stack/2024-06 python/3.12.8
source /cluster/software/manual/rocm632_python/bin/activate

Only basic TensorFlow and PyTorch packages are included. You cannot add packages to this environment. For additional packages, create your own virtual environment:

module load stack/2024-06 python/3.12.8 eth_proxy
srun --ntasks=1 --cpus-per-task=4 --gpus=mi300a:1 --time=4:00:00 --mem-per-cpu=4G --pty bash
# Change to your desired directory
python -m venv --system-site-packages rocm632_python
source ./rocm632_python/bin/activate
OPENBLAS=$OPENBLAS_EULER_ROOT/lib/libopenblas.so pip install --force-reinstall --no-binary numpy numpy==1.26.4
pip install torch==2.4.0 torchaudio==2.4.0 torchvision==0.19.0 pytorch-triton-rocm==3.0.0  -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.2
pip install tensorflow-rocm==2.17.0 -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.3.2

ROCm 6.4.1¶

module load stack/2024-06 python/3.12.8
source /cluster/software/manual/rocm641_python/bin/activate

As above, only basic packages are included. For additional packages, create your own virtual environment:

module load stack/2024-06 python/3.12.8 eth_proxy
srun --ntasks=1 --cpus-per-task=4 --gpus=mi300a:1 --time=4:00:00 --mem-per-cpu=4G --pty bash
# Change to your desired directory
python -m venv --system-site-packages rocm641_python
source ./rocm641_python/bin/activate
OPENBLAS=$OPENBLAS_EULER_ROOT/lib/libopenblas.so pip install --force-reinstall --no-binary numpy numpy==2.0.2
pip install torch==2.6.0 torchaudio==2.6.0 torchvision==0.21.0 pytorch-triton-rocm==3.2.0 -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1
pip install tensorflow-rocm==2.18.1 -f https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4.1

You can install additional packages as needed. The interactive job on MI300A is optional for installation; you may omit --gpus=mi300a:1 for cross-compiling.

Available software¶

The ROCm software stack is available via the stack/2024-06 module collection.

Available modules include:

aqlprofile/6.3.2
comgr/6.3.2
composable-kernel/6.3.2
hip/6.3.2
hipblas/6.3.2
hipblas-common/6.3.2
hipblaslt/6.3.2
hipcc/6.3.2
hipcub/6.3.2
hipfft/6.3.2
hipfort/6.3.2
hipify-clang/6.3.2
hiprand/6.3.2
hipsolver/6.3.2
hipsparse/6.3.2
hiptt/master
hsa-rocr-dev/6.3.2
llvm-amdgpu/6.3.2
miopen-hip/6.3.2
rccl/6.3.2
rocblas/6.3.2
rocfft/6.3.2
rocm-cmake/6.3.2
rocm-core/6.3.2
rocminfo/6.3.2
rocm-openmp-extras/6.3.2
rocm-smi-lib/6.3.2
rocm-tensile/6.3.2
rocmlir/6.3.2
rocprim/6.3.2
rocprofiler-dev/6.3.2
rocprofiler-register/6.3.2
rocrand/6.3.2
rocsolver/6.3.2
rocsparse/6.3.2
rocthrust/6.3.2
roctracer-dev/6.3.2
roctracer-dev-api/6.3.2

Known issues¶

JAX

We haven't been able to get the JAX package to work on the MI300A APUs. We are working on it.