Click for detailed status
Your first job on Euler¶
Welcome to your first hands-on experience with Euler! This tutorial will guide you through submitting and managing your very first computational job on the cluster. By the end of this tutorial, you'll understand how to write job scripts, submit them to the queue, monitor their progress, and retrieve results.
Prerequisites
Before starting this tutorial, make sure you:
- Have successfully logged into Euler (see Getting started)
- Are familiar with basic Linux commands (see Linux command line)
- Understand what SLURM is (see SLURM Overview)
What you'll learn¶
- How to write a simple job script
- How to submit jobs using
sbatch - How to monitor job status with
squeueandmyjobs - How to retrieve and understand job output
- Common troubleshooting steps
Step 1: Understanding the environment¶
First, let's explore where you are and what's available:
Expected output:/cluster/home/yourusername
Expected output: A table showing your storage quotas for home, scratch, and any group storage.
Step 2: Create your first job script¶
Let's create a simple job that demonstrates basic SLURM functionality. We'll make a script that:
- Prints system information
- Runs a simple calculation
- Creates some output files
Create a new file called my_first_job.sh:
Copy and paste this content into the file:
#!/bin/bash
#SBATCH --job-name=my_first_job
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=1G
#SBATCH --output=first_job_%j.out
#SBATCH --error=first_job_%j.err
# Print job information
echo "=========================================="
echo "Job started on: $(date)"
echo "Job ID: $SLURM_JOB_ID"
echo "Running on node: $SLURMD_NODENAME"
echo "Number of CPUs: $SLURM_CPUS_PER_TASK"
echo "Working directory: $PWD"
echo "=========================================="
# Print system information
echo "System information:"
echo "Hostname: $(hostname)"
echo "Operating System: $(uname -a)"
echo "CPU info: $(lscpu | grep 'Model name' | head -1)"
echo "Memory info: $(free -h | grep 'Mem:')"
echo ""
# Do some simple calculations
echo "Performing calculations..."
echo "Computing squares of numbers 1-10:"
for i in {1..10}; do
square=$((i * i))
echo "$i squared = $square"
done
# Create some output files
echo "Creating output files..."
echo "Hello from Euler!" > hello.txt
echo "Job completed successfully" > status.txt
# List files in current directory
echo ""
echo "Files created:"
ls -la *.txt
echo ""
echo "=========================================="
echo "Job completed on: $(date)"
echo "Total runtime: $SECONDS seconds"
echo "=========================================="
Save the file (Ctrl+X, then Y, then Enter in nano).
Let's examine what each part does:
Understanding the SBATCH Directives
--job-name: A descriptive name for your job--time: Maximum runtime (5 minutes for this example)--ntasks: Number of tasks (1 for a simple serial job)--cpus-per-task: CPUs per task (1 CPU is sufficient)--mem-per-cpu: Memory per CPU (1GB should be plenty)--output: Where to save standard output (%j gets replaced with job ID)--error: Where to save error messages
Step 3: Submit your job¶
Now let's submit the job to the queue:
Expected output: Submitted batch job 12345678 (your job ID will be different)
The job ID is important - write it down! You'll use it to monitor and reference your job.
Step 4: Monitor your job¶
Let's check the status of your job:
Expected output while running:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
12345678 gfs my_first_job username R 0:01 1 eu-g2-001
You can also use the more user-friendly myjobs command:
This shows all your jobs in a more readable format.
Job states you might see:
- PD (Pending): Waiting in queue
- R (Running): Currently executing
- CG (Completing): Finishing up
- CD (Completed): Finished successfully
Step 5: Retrieve and examine results¶
Once your job completes (should take less than a minute), let's examine the results:
You should see:
my_first_job.sh(your original script)first_job_12345678.out(output file with your job ID)first_job_12345678.err(error file, hopefully empty)hello.txtandstatus.txt(files created by your job)
Let's examine the output:
You should see detailed information about your job execution, system info, calculations, and timestamps.
This file should be empty if everything went well.
Step 6: Your second job - with parameters¶
Now let's create a more sophisticated job that takes parameters. Create parametric_job.sh:
#!/bin/bash
#SBATCH --job-name=param_job
#SBATCH --time=00:03:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=512M
#SBATCH --output=param_job_%j.out
# Set default values
NUMBER=${1:-100}
OPERATION=${2:-"square"}
echo "Job started: $(date)"
echo "Parameter 1 (number): $NUMBER"
echo "Parameter 2 (operation): $OPERATION"
echo "Using $SLURM_CPUS_PER_TASK CPUs"
# Perform operation based on parameter
case $OPERATION in
"square")
result=$((NUMBER * NUMBER))
echo "$NUMBER squared = $result"
;;
"cube")
result=$((NUMBER * NUMBER * NUMBER))
echo "$NUMBER cubed = $result"
;;
"double")
result=$((NUMBER * 2))
echo "$NUMBER doubled = $result"
;;
*)
echo "Unknown operation: $OPERATION"
echo "Supported operations: square, cube, double"
exit 1
;;
esac
echo "Result: $result"
echo "Job completed: $(date)"
Submit this job with different parameters:
# Default parameters (100, square)
sbatch parametric_job.sh
# Custom parameters
sbatch parametric_job.sh 25 cube
sbatch parametric_job.sh 7 double
Step 7: Interactive jobs¶
Sometimes you want to run commands interactively on a compute node. Here's how:
# Start an interactive session
srun --time=00:10:00 --ntasks=1 --cpus-per-task=1 --mem-per-cpu=1G --pty bash
Once the interactive session starts, you'll see a different prompt indicating you're on a compute node:
# You're now on a compute node! Try these commands:
hostname
whoami
echo "I'm running on a compute node!"
# Exit the interactive session
exit
Step 8: Checking job history¶
View information about completed jobs:
This shows you historical information about your jobs, including how much memory they actually used.
Common issues and solutions¶
Problem: Job stays in PENDING (PD) state¶
Possible causes:
- Resource request too high
- Cluster is busy
- Invalid resource specification
Solution: Check with squeue -j JOBID and look at the REASON column.
Problem: Job fails immediately¶
Check:
- Error file:
cat first_job_*.err - Exit code:
sacct -j JOBID --format=ExitCode - Script permissions:
chmod +x my_first_job.sh
Problem: "Permission denied" error¶
Solution:
Problem: Out of memory¶
Symptoms: Job gets killed, error mentions memory
Solution: Increase --mem-per-cpu in your script
Best practices you've learned¶
- Always specify resources: Time, memory, CPUs
- Use descriptive job names: Makes monitoring easier
- Include logging: Print start/end times and job info
- Handle errors gracefully: Check exit codes
- Test with small jobs first: Before running large computations
Next steps¶
Congratulations! You've successfully:
✅ Created and submitted your first job
✅ Monitored job execution
✅ Retrieved and analyzed results
✅ Learned about interactive sessions
✅ Understanding job parameters
What's next?
- Debugging jobs - Learn to troubleshoot when things go wrong
- SLURM advanced features - Job arrays, dependencies, and more
- Environment modules - Loading software
- Parallel computing - Running multi-core jobs
Quick reference¶
# Submit a job
sbatch my_script.sh
# Check job status
squeue -u $USER
myjobs
# Cancel a job
scancel JOBID
# Interactive session
srun --pty bash
# Job history
sacct
Well Done!
You've completed your first job on Euler! You now have the foundation to run computational work on the cluster. Remember: start small, test thoroughly, and don't hesitate to ask for help when needed.
Need Help?
If you encounter issues or have questions about job submission, check our FAQ or contact support.