GPU Nodes

Euler contains several different GPU models. Since GPU nodes are purchased to meet shareholders' explicit needs, the specifications (CPU, RAM, disk, networks) vary considerably, even for nodes with the same GPU model.

GPU Nodes Access

GPU nodes are only available to shareholders, which invest in GPU hardware.

Operational nodes

GPU	Nodes	GPUs/Node	VRAM	Processing Units
nVidia RTX 2080 Ti	65	8	11 GB	4'352
nVidia TITAN RTX	14	8	24 GB	4'608
nVidia Quadro RTX 6000	20	8	24 GB	4'608
nVidia RTX 3090	32	8	24 GB	10'496
nVidia Tesla A100	3	8	40 GB	8'192
nVidia Tesla A100	3	10	80 GB	8'192
nVidia Tesla A100	2	8	80 GB	8'192
nVidia RTX 4090	80	8	24 GB	16'384
nVidia RTX PRO 6000	10	8	96 GB	24'064

APU	Nodes	APUs/Node	CPU-Cores/APU	RAM	Processing Units
AMD MI300A	6	4	24	128 GB	14'592

APUs combine CPU cores and GPU cores on one die, sharing common HBM3 memory.
(The list is based on sinfo -o "%10D %80G %80N)

Decommissioned nodes

GPU	Nodes	GPUs/Node	VRAM	Decommissioned in
nVidia GTX 1080	10	8	8 GB	2023
nVidia GTX 1080 Ti	50	8	11 GB	2024
nVidia Tesla V100	4	8	16 GB	5'120

Specification Overview¶

The capabilities of different GPU models differ quite substantially. Therefore we provide this overview. For more details, check the sources of these numbers by clicking the links in the table's header.
Question marks (?) mean that we don't know this value. Dashes (-) mean that this operation is not provided natively. Values are in TFLOP/s (Tera Floating Point Operations per Second) for floating points and TOP/s (Tera Operations per Second) for integers.

Theoretical Peak Compute

Precision	ALU	AMD MI300A	nVidia RTX PRO 6000	nVidia A100	nVidia RTX 4090	nVidia RTX 3090
FP64	Normal	61.3	1.9	9.7	1.3	0.5
FP64	Tensor Core	122.6	-	19.5	-	-
FP64	Sparse	-	-	-	-	-
FP32	Normal	122.6	126.0	19.5	82.6	35.6
FP32	Tensor Core	122.6	-	156.0	-	-
FP32	Sparse	-	-	312.0	-	-
FP16	Normal	-	126.0	78.0	82.6	35.6
FP16	Tensor Core	980.6	503.8	312.0	330.3	142.0
FP16	Sparse	1'961.2	1'007.6	624.0	660.6	284.0
FP8	Normal	-	-	-	-	-
FP8	Tensor Core	1'961.2	1'007.6	-	660.6	-
FP8	Sparse	3'922.3	2'015.2	-	1'321.2	-
TF32	Normal	-	-	-	-	-
TF32	Tensor Core	490.3	251.9	156.0	82.6	35.6
TF32	Sparse	980.6	503.8	312.0	165.2	71.0
BF16	Normal	-	126.0	-	-	-
BF16	Tensor Core	980.6	503.8	312.0	165.2	71.0
BF16	Sparse	1'961.2	1'007.6	624.0	330.4	142.0
INT32	Normal	19.5	126.0	19.5	41.3	17.8
INT32	Tensor Core	-	-	-	-	-
INT32	Sparse	-	-	-	-	-
INT8	Normal	?	?	?	?	?
INT8	Tensor Core	1'961.2	1'007.6	1'248.0	660.6	284.0
INT8	Sparse	3'922.3	2'015.2	1'248.0	1'321.2	568.0
INT4	Normal	?	?	?	?	?
INT4	Tensor Core	-	-	1'248.0	1'321.2	568.0
INT4	Sparse	-	-	2'496.0	2'642.4	1'136.0
BOOL	Normal	?	?	?	?	?
BOOL	Tensor Core	-	-	4'992.0	-	-
BOOL	Sparse	-	-	-	-	-

Theoretical Peak Bandwidth

	AMD MI300A	nVidia RTX PRO 6000	nVidia A100	nVidia RTX 4090	nVidia RTX 3090
RAM to GPU [GB/s]	5'300	1'792	1'555	1'008	936
GPU to GPU [GB/s]	256	?	6.25	?	?

Benchmarks¶

To evaluate the performance of Euler's GPUs, we ran several benchmarks. Here's what we measured:

Application	GPUs	Units	AMD MI300A	nVidia RTX PRO 6000	nVidia A100	nVidia RTX 4090
Stream	1	GB/s	2'923	1'491	1'371	927
HPL	1	GFLOP/s	42'800	1'565	14'270	1'209
HPL	4	GFLOP/s	?	62'181	36'850	4'741
HPCG	1	GFLOP/s	548	-	249	188
HPCG	4	GFLOP/s	1'915	-	891	688
Hashcat MD5	4	GH/s	472	817	184	259

Explanation of the benchmarks:

STREAM is the de facto industry standard benchmark for measuring sustained memory bandwidth.
HPL solves a (random) dense linear system in double precision (64 bits) arithmetic. It is compute bound.
HPCG uses the conjugate gradient method to solve a large sparse linear system. It is I/O bound.
Hashcat is a password recovery tool and MD5 is a widely used hash function producing a 128-bit hash value.

AlexNet
AlexNet (FP16) is a neural network for image classification and won ILSVRC in 2012.

ResNet
ResNet50 (FP16) is a neural network for image classification and won ILSVRC in 2015.

DenseNet-121 (FP16) is a deep learning architecture designed for image classification and other tasks like segmentation.