Basic Usage / Quick Start
What is Slurm?
Slurm is a cluster scheduler used to share compute resources in a managed queue.
Our cluster consists of:
At the time of this writing, all but one of these nodes are hosted on openstack. The systems vcuda-[0-4] each have PCI pass-through access to eight Titan XP cards (12G video memory), and have been assigned 16 virtual CPU cores (on the hypervisors Xeon E5) and 128G memory. Nodes groenig-[0-15] are also virtual with 16 virtual cores and 64G memory. tig-k80-0 is our physical host with eight Tesla K80s (12G video memory), 32 cores of Xeon E5, and 128G of memory.
Getting Started
Prerequisites
You will need a CSAIL membership and credentials.
Your program or script must be located in an NFS filesystem. See NFS documentation for information about requesting a filesystem. (Note: AFS tokens cannot be used within the cluster, and this includes your CSAIL home directory)
To run a job:
- place your code and any required data in your NFS directory
Data placed in /home will not be accessible across the cluster, and may cause your program to behave unexpectedly.
determine what resources you want, and for how long
write an sbatch script to specify your required resources and the command to be run
open a shell to tig-slurm.csail.mit.edu and invoke sbatch on the command line
Writing an sbatch script
An sbatch script contains:
SBATCH directives that specify the needed resources
A command or commands to be submitted to the cluster as a job
Here is a simple example.
% cat hello.sbatch
#!/bin/bash
#SBATCH --time=00:00:10
#SBATCH --partition=cpu
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=128
srun echo 'Hello, world!'
Each ‘srun’ task is submitted as a job to the cluster, and its output is written to a file named after that job.
% sbatch hello.sbatch
Submitted batch job 127314
% cat slurm-127314.out
Hello, world!
Submit multiple jobs
We can request an array of jobs with sbatch.
% cat dice.sbatch
#!/bin/bash
#SBATCH --time=00:00:10
#SBATCH --partition=cpu
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=128
srun echo $(( $RANDOM % 6 + 1))
Each sbatch directive applies to all srun commands in the script.
% sbatch --array=0-5 dice.sbatch
Submitted batch job 129942
% ls slurm-129942* && cat slurm-129942*
slurm-129942_0.out slurm-129942_1.out slurm-129942_2.out slurm-129942_3.out slurm-129942_4.out slurm-129942_5.out
1
6
4
3
2
6
Use the array index
A task in an array can access its index from an environment variable.
cat nlog.sbatch
#!/bin/bash
#SBATCH --time=00:00:10
#SBATCH --partition=cpu
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=12
srun python -c 'import math,os; \
my_index = int(os.getenv("SLURM_ARRAY_TASK_ID"));\
print(math.log( my_index ))'
% sbatch --array=1-8 nlog.sbatch
Submitted batch job 128222
% cat slurm-128222_*
0.0
0.69314718056
1.09861228867
1.38629436112
1.60943791243
1.79175946923
1.94591014906
2.07944154168
Request GPU resources
If you require use of a GPU, submit a job to the “gpu” partition and request gpus with –gres (as in “generic resource”)
#$!/bin/bash
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
srun nvidia-smi -q
For a specific GPU type, add it to the GRES directive like: –gres=gpu:titan:1
% sbatch gpu.sbatch
Submitted batch job 129931
% head slurm-129931.out
==============NVSMI LOG==============
Timestamp : Tue Jun 2 09:00:00 2020
Driver Version : 430.50
CUDA Version : 10.1
Attached GPUs : 8
GPU 00000000:04:00.0
Product Name : Tesla K80
Additional options
For advanced usage of sbatch, please refer to https://slurm.schedmd.com/sbatch.html
Interactive shell
For debugging, it can be useful to open an interactive shell on the cluster.
srun --pty --cpus-per-task=1 --mem-per-cpu 2000M --time=00:45:00 /bin/bash
Job Priority / QoS
When a job is submitted without a –qos option, the default QoS will limit the resources you can claim. Current limits can be seen on the login banner at tig-slurm.csail.mit.edu
This quota can be bypassed by setting the –qos=low. This is useful when the cluster is mostly idle, and you would like to make use of available resources beyond your quota. However, if these resources are required for someone else’s job, your job may be terminated.
Your job and the queue
sinfo
sinfo provides basic information about available cluster resources.
% sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
cpu up 4-00:00:00 2 idle groenig-[0-2]
gpu* up 4-00:00:00 1 resv vcuda-4
gpu* up 4-00:00:00 1 mix vcuda-0
gpu* up 4-00:00:00 3 idle vcuda-[1-3]
Additional details about a node are available with ‘scontrol show node’.
% scontrol show node groenig-0
NodeName=groenig-0 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUErr=0 CPUTot=16 CPULoad=0.00
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=groenig-0 NodeHostName=groenig-0 Version=17.11
OS=Linux 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019
RealMemory=55000 AllocMem=0 FreeMem=58731 Sockets=16 Boards=1
State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=cpu
BootTime=2019-12-05T17:19:47 SlurmdStartTime=2020-05-31T06:25:10
CfgTRES=cpu=16,mem=55000M,billing=16
AllocTRES=
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
squeue
You may list the contents of the queue with squeue.
% squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
129959 cpu bash erin R 0:05 1 groenig-0
scontrol show job
Detailed information about your job is available with ‘scontrol show job’.
% scontrol show job 129959
JobId=129959 JobName=bash
UserId=erin(23372) GroupId=erin(23372) MCS_label=N/A
Priority=1 Nice=0 Account=csail QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
RunTime=00:01:40 TimeLimit=1-00:00:00 TimeMin=N/A
SubmitTime=2020-06-02T12:12:45 EligibleTime=2020-06-02T12:12:45
StartTime=2020-06-02T12:12:45 EndTime=2020-06-03T12:12:45 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2020-06-02T12:12:45
Partition=cpu AllocNode:Sid=slurm-control-0:29786
Partition=cpu AllocNode:Sid=slurm-control-0:29786
ReqNodeList=(null) ExcNodeList=(null)
NodeList=groenig-0
BatchHost=groenig-0
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=2000M,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=2000M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
Gres=(null) Reservation=(null)
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/bin/bash
WorkDir=/home/erin
Power=
scancel
You can use scancel terminate your job early, for example:
% scancel -v 129960
scancel: Terminating job 129960