System Status: All systems are operational • Services are available and operational.
Click for detailed status
Click for detailed status
GPU socket redefinition¶
To fix the issue of occasionally draining GPU nodes we will need to redefine Slurm's definition of “sockets” on most GPU nodes.
The redefinition will be done on Monday, January 26 2026.
We need to postpone the redefinition towards end of this week.
Generally,
- this affects only GPU nodes with AMD processors (i.e., almost all GPU nodes)
- any
sbatch/srunsocket options (e.g.,--cpus-per-socket) will be affected for GPU jobs: - submitting job may fail with an error messag related to sockets
- jobs submitted prior to the change and using these options may wait forever
If you think you are affected by this, then please let us know.
We thank you for your understanding.