Your first job on Euler¶

Welcome to your first hands-on experience with Euler! This tutorial will guide you through submitting and managing your very first computational job on the cluster. By the end of this tutorial, you'll understand how to write job scripts, submit them to the queue, monitor their progress, and retrieve results.

Prerequisites

Before starting this tutorial, make sure you:

Have successfully logged into Euler (see Getting started)
Are familiar with basic Linux commands (see Linux command line)
Understand what SLURM is (see SLURM Overview)

What you'll learn¶

How to write a simple job script
How to submit jobs using sbatch
How to monitor job status with squeue and myjobs
How to retrieve and understand job output
Common troubleshooting steps

Step 1: Understanding the environment¶

First, let's explore where you are and what's available:

# Check your current location
pwd

Expected output: /cluster/home/yourusername

# Check available storage
lquota

Expected output: A table showing your storage quotas for home, scratch, and any group storage.

# Check what modules are available
module avail

Step 2: Create your first job script¶

Let's create a simple job that demonstrates basic SLURM functionality. We'll make a script that:

Prints system information
Runs a simple calculation
Creates some output files

Create a new file called my_first_job.sh:

nano my_first_job.sh

Copy and paste this content into the file:

#!/bin/bash
#SBATCH --job-name=my_first_job
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --output=first_job_%j.out
#SBATCH --error=first_job_%j.err

# Print job information
echo "=========================================="
echo "Job started on: $(date)"
echo "Job ID: $SLURM_JOB_ID"
echo "Running on node: $SLURMD_NODENAME"
echo "Number of CPUs: $SLURM_CPUS_PER_TASK"
echo "Working directory: $PWD"
echo "=========================================="

# Print system information
echo "System information:"
echo "Hostname: $(hostname)"
echo "Operating System: $(uname -a)"
echo "CPU info: $(lscpu | grep 'Model name' | head -1)"
echo "Memory info: $(free -h | grep 'Mem:')"
echo ""

# Do some simple calculations
echo "Performing calculations..."
echo "Computing squares of numbers 1-10:"
for i in {1..10}; do
    square=$((i * i))
    echo "$i squared = $square"
done

# Create some output files
echo "Creating output files..."
echo "Hello from Euler!" > hello.txt
echo "Job completed successfully" > status.txt

# List files in current directory
echo ""
echo "Files created:"
ls -la *.txt

echo ""
echo "=========================================="
echo "Job completed on: $(date)"
echo "Total runtime: $SECONDS seconds"
echo "=========================================="

Save the file (Ctrl+X, then Y, then Enter in nano).

Let's examine what each part does:

Understanding the SBATCH Directives

--job-name: A descriptive name for your job
--time: Maximum runtime (5 minutes for this example)
--ntasks: Number of tasks (1 for a simple serial job)
--cpus-per-task: CPUs per task (1 CPU is sufficient)
--mem-per-cpu: Memory per CPU (1GB should be plenty)
--output: Where to save standard output (%j gets replaced with job ID)
--error: Where to save error messages

Step 3: Submit your job¶

Now let's submit the job to the queue:

sbatch my_first_job.sh

Expected output: Submitted batch job 12345678 (your job ID will be different)

The job ID is important - write it down! You'll use it to monitor and reference your job.

Step 4: Monitor your job¶

Let's check the status of your job:

# Check job status (replace 12345678 with your actual job ID)
squeue -j 12345678

Expected output while running:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
12345678      gfs my_first_job username  R       0:01      1 eu-g2-001

You can also use the more user-friendly myjobs command:

myjobs

This shows all your jobs in a more readable format.

Job states you might see:

PD (Pending): Waiting in queue
R (Running): Currently executing
CG (Completing): Finishing up
CD (Completed): Finished successfully

Step 5: Retrieve and examine results¶

Once your job completes (should take less than a minute), let's examine the results:

# List files in your directory
ls -la

You should see:

my_first_job.sh (your original script)
first_job_12345678.out (output file with your job ID)
first_job_12345678.err (error file, hopefully empty)
hello.txt and status.txt (files created by your job)

Let's examine the output:

# View the main output file
cat first_job_*.out

You should see detailed information about your job execution, system info, calculations, and timestamps.

# Check if there were any errors
cat first_job_*.err

This file should be empty if everything went well.

# Check the files your job created
cat hello.txt
cat status.txt

Step 6: Your second job - with parameters¶

Now let's create a more sophisticated job that takes parameters. Create parametric_job.sh:

nano parametric_job.sh

#!/bin/bash
#SBATCH --job-name=param_job
#SBATCH --time=00:03:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=512M
#SBATCH --output=param_job_%j.out

# Set default values
NUMBER=${1:-100}
OPERATION=${2:-"square"}

echo "Job started: $(date)"
echo "Parameter 1 (number): $NUMBER"
echo "Parameter 2 (operation): $OPERATION"
echo "Using $SLURM_CPUS_PER_TASK CPUs"

# Perform operation based on parameter
case $OPERATION in
    "square")
        result=$((NUMBER * NUMBER))
        echo "$NUMBER squared = $result"
        ;;
    "cube")
        result=$((NUMBER * NUMBER * NUMBER))
        echo "$NUMBER cubed = $result"
        ;;
    "double")
        result=$((NUMBER * 2))
        echo "$NUMBER doubled = $result"
        ;;
    *)
        echo "Unknown operation: $OPERATION"
        echo "Supported operations: square, cube, double"
        exit 1
        ;;
esac

echo "Result: $result"
echo "Job completed: $(date)"

Submit this job with different parameters:

# Default parameters (100, square)
sbatch parametric_job.sh

# Custom parameters
sbatch parametric_job.sh 25 cube
sbatch parametric_job.sh 7 double

Step 7: Interactive jobs¶

Sometimes you want to run commands interactively on a compute node. Here's how:

# Start an interactive session
srun --time=00:10:00 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=1G --pty bash

Once the interactive session starts, you'll see a different prompt indicating you're on a compute node:

# You're now on a compute node! Try these commands:
hostname
whoami
echo "I'm running on a compute node!"

# Exit the interactive session
exit

Step 8: Checking job history¶

View information about completed jobs:

# See your recent jobs
sacct --format=JobID,JobName,State,ExitCode,Elapsed,ReqMem,MaxRSS

This shows you historical information about your jobs, including how much memory they actually used.

Common issues and solutions¶

Problem: Job stays in PENDING (PD) state¶

Possible causes:

Resource request too high
Cluster is busy
Invalid resource specification

Solution: Check with squeue -j JOBID and look at the REASON column.

Problem: Job fails immediately¶

Check:

Error file: cat first_job_*.err
Exit code: sacct -j JOBID --format=ExitCode
Script permissions: chmod +x my_first_job.sh

Problem: "Permission denied" error¶

Solution:

chmod +x my_first_job.sh

Problem: Out of memory¶

Symptoms: Job gets killed, error mentions memory Solution: Increase --mem-per-cpu in your script

Best practices you've learned¶

Always specify resources: Time, memory, CPUs
Use descriptive job names: Makes monitoring easier
Include logging: Print start/end times and job info
Handle errors gracefully: Check exit codes
Test with small jobs first: Before running large computations

Next steps¶

Congratulations! You've successfully:

✅ Created and submitted your first job
✅ Monitored job execution
✅ Retrieved and analyzed results
✅ Learned about interactive sessions
✅ Understanding job parameters

What's next?

Debugging jobs - Learn to troubleshoot when things go wrong
SLURM advanced features - Job arrays, dependencies, and more
Environment modules - Loading software
Parallel computing - Running multi-core jobs

Quick reference¶

# Submit a job
sbatch my_script.sh

# Check job status
squeue -u $USER
myjobs

# Cancel a job
scancel JOBID

# Interactive session
srun --pty bash

# Job history
sacct

Well Done!

You've completed your first job on Euler! You now have the foundation to run computational work on the cluster. Remember: start small, test thoroughly, and don't hesitate to ask for help when needed.

Need Help?

If you encounter issues or have questions about job submission, check our FAQ or contact support.