Click for detailed status
Slurm
Slurm Workload Manager is a free and open-source job scheduler for Linux and Unix-like systems, widely used by supercomputers and clusters.
Official slurm documentation: https://slurm.schedmd.com/documentation.html
Slurm on Euler¶
On Euler, Slurm version 25.05 is used since 13. Jan 2026.
Euler runs jobs non-exclusively on compute nodes. This means multiple jobs can run on the same node at the same time. We do this to maximize resource utilization and efficiency.
Resource Requests
Specify your resource requirements (CPUs, memory, time, etc.) as accurately as possible; as specific as required, as loosely as possible. The smaller a job's footprint, the quicker it will start. Don't specify CPU types, GPU types, partitions, or other specific resources unless your job really depends on them.
Ignore Partitions
Don't specify partitions. Slurm internally uses partitions (queues) to manage resources, but as a user you best ignore them.
GPU Nodes Access
GPU nodes are only available to shareholders, which invest in GPU hardware.
Usage¶
Main client commands:
srun: Get a job allocation and execute an applicationsbatch: Submit a batch script to Slurmsqueue: View the job queuescancel: Remove a job from the queue
Additional commands:
salloc: Get a job allocationsacct: View accounting datasbcast: Broadcast file to a job's compute nodessinfo: View nodes and partitionssstat: Display status information of a running job/step
Custom extensions on Euler:
myjobs: Human-readable job info (pretty print ofsqueue)my_share_info: Shows your cluster share(s)get_inefficient_jobs: Displays inefficient jobs
Web interfaces:
Common options for sbatch and srun:
| Option | Description |
|---|---|
-t, --time=<time> |
Maximum runtime (default: 1 hour) |
-n, --ntasks=<number> |
Maximum number of tasks (default: 1) |
-c, --cpus-per-task=<ncpus> |
CPUs per task (default: 1) |
--mem-per-cpu=<size>[units] |
Memory per CPU (default: MB) |
-N, --nodes=<minnodes> |
Minimum nodes (default: 1) |
--gpus-per-task=<number> |
GPUs per task (default: 0) |
--x11 |
Enable X11 forwarding |
--tmp=<size>[units] |
Temporary disk space (default: 0) |
-o, --output=<filename> |
Output file for job stdout (default: slurm-%j.out) |
-e, --error=<filename> |
Output file for job stderr (default: slurm-%j.err) |
--wrap=<command_string> |
Wrap command string in a shell script |
-A, --account=<share_name> |
Specify the account to charge the job to (default: your default account) |
-d, --dependency=<dependency> |
Specify job dependencies (e.g., afterok:<job_id>) |
Jobs inherit the environment variables from the shell that submits them.
Interactive sessions¶
To start an interactive session, use srun with the --pty option:
--pty allocates a pseudo-terminal for the session, allowing you to interact with the shell as if it were a local terminal. You can specify additional options like --ntasks, --cpus-per-task, etc. to customize the resources.
This is very useful for debugging or running commands interactively on a compute node.
Submitting parallel jobs (srun)¶
To submit a parallel job, use srun with the --ntasks option:
nproc in 5 tasks, each with 8 CPUs. The nproc command prints the number of available CPUs in the current task.
Submitting batch jobs (sbatch)¶
To submit a batch job, create a script file (e.g., batch_script.txt) with the following content:
sbatch:
sbatch will put the job in the queue and execute it when resources are available. The output will be written to a file named slurm-<job_id>.out in the current directory. sbatch does not wait for the job to finish; it returns immediately after submitting the job; use --wait to make it wait. (You can check the job status with squeue.)
Note: The
#SBATCHdirectives must be placed before the first executable command in the script.Note: Without
srun, the job will not run in parallel across all tasks. It will only run the command once, using the total resources requested.
Submitting job arrays (sbatch --array)¶
A job array allows you to submit multiple similar jobs with a single command. This is useful for running the same job with different parameters or datasets.
This command submits 10 jobs, each with a different index (1 to 10). You can access the index in the script using the environment variable$SLURM_ARRAY_TASK_ID.
You can find the possible options for job arrays in the Slurm documentation.
Submitting MPI jobs¶
Slurm supports MPI (Message Passing Interface) jobs. See https://slurm.schedmd.com/mpi_guide.html#open_mpi for more information.
Submitting GPU jobs¶
GPU Nodes Access
GPU nodes are only available to shareholders, which invest in GPU hardware.
To submit a job that requires GPUs, use the --gpus option:
nvidia-smi to display GPU information. You can also specify the type(s) of GPU(s) you want to use, e.g., --gpus=a100,v100:1 for 1x A100 or 1x V100 GPU.
| GPU Model | Slurm Identifier |
|---|---|
| Nvidia RTX 2080 Ti | rtx_2080 |
| Nvidia TITAN RTX | nvidia_titan_rtx |
| Nvidia Quadro RTX 6000 | rtx_6000 |
| Nvidia RTX 3090 | rtx_3090 |
| Nvidia Tesla A100 (40 GB) | a100 |
| Nvidia Tesla A100 (80 GB) | a100_80gb |
| Nvidia RTX 4090 | rtx_4090 |
| Nvidia RTX Pro 6000 | pro_6000 |
Nvidia RTX Pro 6000¶
Programs compiled with CUDA 12 libraries will not run on Blackwell GPUs such as the Nvidia RTX Pro 6000. Since CUDA 13 is relatively new and not used yet by many programs, we opted to not open the new RTX Pro 6000 GPUs to all jobs since we expect most would fail.
Only batch jobs explicitly requesting the RTX Pro 6000 GPU can therefore run on these nodes, e.g.,
expectedly until mid-2026.Accounting and Shares¶
If you are a member of multiple shareholder groups, you can select which group a job should be scheduled and accounted to, by passing slurm the option -A <share_name>. To configure a default group, specify the following in your ~/.slurm/defaults file (create it if it does not exist and replace <share_name> with your preferred default share):
Custom extensions on Euler¶
myjobs¶
Use myjobs -j <job_id> to view detailed information of a job. It provides a human-friendly summary of Slurm jobs, improving on squeue's output.
Example of a pending job¶
$ myjobs -j 6038307
Job information
Job ID : 6038307
Status : PENDING
Running on node : None assigned
User : nmarounina
Shareholder group : es_cdss
Slurm partition (queue) : gpu.24h
Command : script.sbatch
Working directory : /cluster/home/nmarounina
Requested resources
Requested runtime : 08:00:00
Requested cores (total) : 12
Requested nodes : 1
Requested memory (total) : 120000 MiB
Job history
Submitted at : 2023-01-09T15:56:09
Started at : Job did not start yet
Queue waiting time : 8 s
Resource usage
Wall-clock :
Total CPU time : -
CPU utilization : - %
Total resident memory : - MiB
Resident memory utilization : - %
Example of a running job¶
$ myjobs -j 6038307
Job information
Job ID : 6038307
Status : RUNNING
Running on node : eu-g3-022
User : nmarounina
Shareholder group : es_cdss
Slurm partition (queue) : gpu.24h
Command : script.sbatch
Working directory : /cluster/home/nmarounina
Requested resources
Requested runtime : 08:00:00
Requested cores (total) : 12
Requested nodes : 1
Requested memory (total) : 120000 MiB
Job history
Submitted at : 2023-01-09T15:56:09
Started at : 2023-01-09T15:56:38
Queue waiting time : 29 s
Resource usage
Wall-clock : 00:00:36
Total CPU time : 00:00:00
CPU utilization : 0%
Total resident memory : 2.94 MiB
Resident memory utilization : 0%
my_share_info¶
Use my_share_info to list the shareholder groups you belong to.
Example output¶
get_inefficient_jobs¶
Use get_inefficient_jobs to find jobs that are running inefficiently, i.e. using more resources than requested.
Example output¶
$ get_inefficient_jobs
Looking for user dohofer
37279906
37301756
37301756
37301756
37301756
37305670
37305759
37306347
37306347
37306347
37306347
37306347
37306347
37306347
37306347
37498300
37602385
Average CPU used (higher better): 12.1% (equivalent to wasting 4.0 cores)
Average system RAM used (higher better): 25.2% (equivalent to wasting 14.0 GB)
Average GPU used (higher better): 100.0% (equivalent to wasting 0.0 GPUs)
Average GPU RAM used (higher better): 100.0% (equivalent to wasting 0.0 GB)
CPU to GPU ratio (lower better): 0.0% (equivalent to 0.0 GPUs blocked)
RAM to GPU ratio (lower better): 0.0% (equivalent to 0.0 GPUs blocked)
Number of short jobs (lower better): 1
Known issues¶
ReqNodeNotAvail¶
Sometimes Slurm reports that requested nodes are not available and lists some nodes. It looks something like this:
If you have not requested these nodes explicitly, this message by Slurm is misleading. The job is actually waiting in the queue. No user action is required; just patience.