Click for detailed status
GPU Nodes
Euler contains several different GPU models. Since GPU nodes are purchased to meet shareholders' explicit needs, the specifications (CPU, RAM, disk, networks) vary considerably, even for nodes with the same GPU model.
GPU Nodes Access
GPU nodes are only available to shareholders, which invest in GPU hardware.
Operational nodes
| GPU | Nodes | GPUs/Node | VRAM | Processing Units |
|---|---|---|---|---|
| nVidia RTX 2080 Ti | 65 | 8 | 11 GB | 4'352 |
| nVidia TITAN RTX | 14 | 8 | 24 GB | 4'608 |
| nVidia Quadro RTX 6000 | 20 | 8 | 24 GB | 4'608 |
| nVidia RTX 3090 | 32 | 8 | 24 GB | 10'496 |
| nVidia Tesla A100 | 3 | 8 | 40 GB | 8'192 |
| nVidia Tesla A100 | 3 | 10 | 80 GB | 8'192 |
| nVidia Tesla A100 | 2 | 8 | 80 GB | 8'192 |
| nVidia RTX 4090 | 80 | 8 | 24 GB | 16'384 |
| nVidia RTX PRO 6000 | 10 | 8 | 96 GB | 24'064 |
| APU | Nodes | APUs/Node | CPU-Cores/APU | RAM | Processing Units |
|---|---|---|---|---|---|
| AMD MI300A | 6 | 4 | 24 | 128 GB | 14'592 |
APUs combine CPU cores and GPU cores on one die, sharing common HBM3 memory.
(The list is based on sinfo -o "%10D %80G %80N)
Decommissioned nodes
| GPU | Nodes | GPUs/Node | VRAM | Decommissioned in |
|---|---|---|---|---|
| nVidia GTX 1080 | 10 | 8 | 8 GB | 2023 |
| nVidia GTX 1080 Ti | 50 | 8 | 11 GB | 2024 |
| nVidia Tesla V100 | 4 | 8 | 16 GB | 5'120 |
Specification Overview¶
The capabilities of different GPU models differ quite substantially. Therefore we provide this overview. For more details, check the sources of these numbers by clicking the links in the table's header.
Question marks (?) mean that we don't know this value. Dashes (-) mean that this operation is not provided natively. Values are in TFLOP/s (Tera Floating Point Operations per Second) for floating points and TOP/s (Tera Operations per Second) for integers.
Theoretical Peak Compute
| Precision | ALU | AMD MI300A | nVidia RTX PRO 6000 | nVidia A100 | nVidia RTX 4090 | nVidia RTX 3090 |
|---|---|---|---|---|---|---|
| FP64 | Normal | 61.3 | 1.9 | 9.7 | 1.3 | 0.5 |
| FP64 | Tensor Core | 122.6 | - | 19.5 | - | - |
| FP64 | Sparse | - | - | - | - | - |
| FP32 | Normal | 122.6 | 126.0 | 19.5 | 82.6 | 35.6 |
| FP32 | Tensor Core | 122.6 | - | 156.0 | - | - |
| FP32 | Sparse | - | - | 312.0 | - | - |
| FP16 | Normal | - | 126.0 | 78.0 | 82.6 | 35.6 |
| FP16 | Tensor Core | 980.6 | 503.8 | 312.0 | 330.3 | 142.0 |
| FP16 | Sparse | 1'961.2 | 1'007.6 | 624.0 | 660.6 | 284.0 |
| FP8 | Normal | - | - | - | - | - |
| FP8 | Tensor Core | 1'961.2 | 1'007.6 | - | 660.6 | - |
| FP8 | Sparse | 3'922.3 | 2'015.2 | - | 1'321.2 | - |
| TF32 | Normal | - | - | - | - | - |
| TF32 | Tensor Core | 490.3 | 251.9 | 156.0 | 82.6 | 35.6 |
| TF32 | Sparse | 980.6 | 503.8 | 312.0 | 165.2 | 71.0 |
| BF16 | Normal | - | 126.0 | - | - | - |
| BF16 | Tensor Core | 980.6 | 503.8 | 312.0 | 165.2 | 71.0 |
| BF16 | Sparse | 1'961.2 | 1'007.6 | 624.0 | 330.4 | 142.0 |
| INT32 | Normal | 19.5 | 126.0 | 19.5 | 41.3 | 17.8 |
| INT32 | Tensor Core | - | - | - | - | - |
| INT32 | Sparse | - | - | - | - | - |
| INT8 | Normal | ? | ? | ? | ? | ? |
| INT8 | Tensor Core | 1'961.2 | 1'007.6 | 1'248.0 | 660.6 | 284.0 |
| INT8 | Sparse | 3'922.3 | 2'015.2 | 1'248.0 | 1'321.2 | 568.0 |
| INT4 | Normal | ? | ? | ? | ? | ? |
| INT4 | Tensor Core | - | - | 1'248.0 | 1'321.2 | 568.0 |
| INT4 | Sparse | - | - | 2'496.0 | 2'642.4 | 1'136.0 |
| BOOL | Normal | ? | ? | ? | ? | ? |
| BOOL | Tensor Core | - | - | 4'992.0 | - | - |
| BOOL | Sparse | - | - | - | - | - |
Theoretical Peak Bandwidth
| AMD MI300A | nVidia RTX PRO 6000 | nVidia A100 | nVidia RTX 4090 | nVidia RTX 3090 | |
|---|---|---|---|---|---|
| RAM to GPU [GB/s] | 5'300 | 1'792 | 1'555 | 1'008 | 936 |
| GPU to GPU [GB/s] | 256 | ? | 6.25 | ? | ? |
Benchmarks¶
To evaluate the performance of Euler's GPUs, we ran several benchmarks. Here's what we measured:
| Application | GPUs | Units | AMD MI300A | nVidia RTX PRO 6000 | nVidia A100 | nVidia RTX 4090 |
|---|---|---|---|---|---|---|
| Stream | 1 | GB/s | 2'923 | 1'491 | 1'371 | 927 |
| HPL | 1 | GFLOP/s | 42'800 | 1'565 | 14'270 | 1'209 |
| HPL | 4 | GFLOP/s | ? | 62'181 | 36'850 | 4'741 |
| HPCG | 1 | GFLOP/s | 548 | - | 249 | 188 |
| HPCG | 4 | GFLOP/s | 1'915 | - | 891 | 688 |
| Hashcat MD5 | 4 | GH/s | 472 | 817 | 184 | 259 |
Explanation of the benchmarks:
- STREAM is the de facto industry standard benchmark for measuring sustained memory bandwidth.
- HPL solves a (random) dense linear system in double precision (64 bits) arithmetic. It is compute bound.
- HPCG uses the conjugate gradient method to solve a large sparse linear system. It is I/O bound.
- Hashcat is a password recovery tool and MD5 is a widely used hash function producing a 128-bit hash value.

AlexNet (FP16) is a neural network for image classification and won ILSVRC in 2012.

ResNet50 (FP16) is a neural network for image classification and won ILSVRC in 2015.

DenseNet-121 (FP16) is a deep learning architecture designed for image classification and other tasks like segmentation.