Skip to content
System Status: All systems are operational • Services are available and operational.
Click for detailed status

GPU Nodes

Euler contains several different GPU models. Since GPU nodes are purchased to meet shareholders' explicit needs, the specifications (CPU, RAM, disk, networks) vary considerably, even for nodes with the same GPU model.

GPU Nodes Access

GPU nodes are only available to shareholders, which invest in GPU hardware.

Operational nodes

GPU Nodes GPUs/Node VRAM Processing Units
nVidia RTX 2080 Ti 65 8 11 GB 4'352
nVidia TITAN RTX 14 8 24 GB 4'608
nVidia Quadro RTX 6000 20 8 24 GB 4'608
nVidia RTX 3090 32 8 24 GB 10'496
nVidia Tesla A100 3 8 40 GB 8'192
nVidia Tesla A100 3 10 80 GB 8'192
nVidia Tesla A100 2 8 80 GB 8'192
nVidia RTX 4090 80 8 24 GB 16'384
nVidia RTX PRO 6000 10 8 96 GB 24'064
APU Nodes APUs/Node CPU-Cores/APU RAM Processing Units
AMD MI300A 6 4 24 128 GB 14'592

APUs combine CPU cores and GPU cores on one die, sharing common HBM3 memory.
(The list is based on sinfo -o "%10D %80G %80N)

Decommissioned nodes

GPU Nodes GPUs/Node VRAM Decommissioned in
nVidia GTX 1080 10 8 8 GB 2023
nVidia GTX 1080 Ti 50 8 11 GB 2024
nVidia Tesla V100 4 8 16 GB 5'120

Specification Overview

The capabilities of different GPU models differ quite substantially. Therefore we provide this overview. For more details, check the sources of these numbers by clicking the links in the table's header.
Question marks (?) mean that we don't know this value. Dashes (-) mean that this operation is not provided natively. Values are in TFLOP/s (Tera Floating Point Operations per Second) for floating points and TOP/s (Tera Operations per Second) for integers.

Theoretical Peak Compute

Precision ALU AMD MI300A nVidia RTX PRO 6000 nVidia A100 nVidia RTX 4090 nVidia RTX 3090
FP64 Normal 61.3 1.9 9.7 1.3 0.5
FP64 Tensor Core 122.6 - 19.5 - -
FP64 Sparse - - - - -
FP32 Normal 122.6 126.0 19.5 82.6 35.6
FP32 Tensor Core 122.6 - 156.0 - -
FP32 Sparse - - 312.0 - -
FP16 Normal - 126.0 78.0 82.6 35.6
FP16 Tensor Core 980.6 503.8 312.0 330.3 142.0
FP16 Sparse 1'961.2 1'007.6 624.0 660.6 284.0
FP8 Normal - - - - -
FP8 Tensor Core 1'961.2 1'007.6 - 660.6 -
FP8 Sparse 3'922.3 2'015.2 - 1'321.2 -
TF32 Normal - - - - -
TF32 Tensor Core 490.3 251.9 156.0 82.6 35.6
TF32 Sparse 980.6 503.8 312.0 165.2 71.0
BF16 Normal - 126.0 - - -
BF16 Tensor Core 980.6 503.8 312.0 165.2 71.0
BF16 Sparse 1'961.2 1'007.6 624.0 330.4 142.0
INT32 Normal 19.5 126.0 19.5 41.3 17.8
INT32 Tensor Core - - - - -
INT32 Sparse - - - - -
INT8 Normal ? ? ? ? ?
INT8 Tensor Core 1'961.2 1'007.6 1'248.0 660.6 284.0
INT8 Sparse 3'922.3 2'015.2 1'248.0 1'321.2 568.0
INT4 Normal ? ? ? ? ?
INT4 Tensor Core - - 1'248.0 1'321.2 568.0
INT4 Sparse - - 2'496.0 2'642.4 1'136.0
BOOL Normal ? ? ? ? ?
BOOL Tensor Core - - 4'992.0 - -
BOOL Sparse - - - - -

Theoretical Peak Bandwidth

AMD MI300A nVidia RTX PRO 6000 nVidia A100 nVidia RTX 4090 nVidia RTX 3090
RAM to GPU [GB/s] 5'300 1'792 1'555 1'008 936
GPU to GPU [GB/s] 256 ? 6.25 ? ?

Benchmarks

To evaluate the performance of Euler's GPUs, we ran several benchmarks. Here's what we measured:

Application GPUs Units AMD MI300A nVidia RTX PRO 6000 nVidia A100 nVidia RTX 4090
Stream 1 GB/s 2'923 1'491 1'371 927
HPL 1 GFLOP/s 42'800 1'565 14'270 1'209
HPL 4 GFLOP/s ? 62'181 36'850 4'741
HPCG 1 GFLOP/s 548 - 249 188
HPCG 4 GFLOP/s 1'915 - 891 688
Hashcat MD5 4 GH/s 472 817 184 259

Explanation of the benchmarks:

  • STREAM is the de facto industry standard benchmark for measuring sustained memory bandwidth.
  • HPL solves a (random) dense linear system in double precision (64 bits) arithmetic. It is compute bound.
  • HPCG uses the conjugate gradient method to solve a large sparse linear system. It is I/O bound.
  • Hashcat is a password recovery tool and MD5 is a widely used hash function producing a 128-bit hash value.

AlexNet
AlexNet (FP16) is a neural network for image classification and won ILSVRC in 2012.

ResNet
ResNet50 (FP16) is a neural network for image classification and won ILSVRC in 2015.

DenseNet
DenseNet-121 (FP16) is a deep learning architecture designed for image classification and other tasks like segmentation.