Basics of SLURM#
SLURM (Simple Linux Utility for Resource Management) is the workload manager used on most modern HPC clusters. It acts as both a scheduler and a resource manager.
Think of SLURM as an air traffic controller for computational jobs. Just as air traffic control manages which planes can take off, land, and use specific runways, SLURM manages:
- Resource allocation: Which jobs get access to which compute nodes
- Job scheduling: When jobs run based on priority, resource availability, and fairness
- Resource monitoring: Tracking CPU, memory, and GPU usage
- Job lifecycle: Starting, monitoring, and terminating jobs
Job Submission Process#

When you submit a job to SLURM:
- Job submission: You specify resource requirements (CPUs, memory, GPUs, time limit)
- Queue placement: SLURM places your job in a queue based on
partitionandpriority - Resource matching: SLURM scans for available resources matching your requirements
- Allocation: When resources become available, SLURM allocates nodes to your job
- Execution: Your job runs on the allocated compute nodes
- Cleanup: Upon completion, resources are released for other jobs
Info
See Running Jobs for more details on SLURM and Job Scheduling.