Slurm Resources
Basic Resources
You will need to specify the following for every job:
Time
Specify your job’s maximum run time with the --time=
option. Available formats include D-HH:MM:SS, MM:SS, D-HH, and HH:MM:SS.
The maximum allowed time varies by partition.
To have your job queued more quickly, consider a lower time limit.
Memory
Your job will be limited to the allocated amount of memory. In the simplest case, use --mem
in MB, or specify the unit like --mem=8G
.
You can also specify relative memory values, with --mem-per-cpu
or --mem-per-gpu
CPU
CPU requirements are expressed in number of cores. The simple case is --cpus-per-task
, where the default number of tasks is 1 but more can be specified with –ntasks.
Additional Resource Options
GPU
Request a number of GPUs for your job with --gpus=
.
GPU memory / VRAM is not currently managed by Slurm. To guarantee that a minimum amount of GPU memory is available, choose a GPU type with the --constraint=
option (see below).
Node Features
A “feature” in Slurm is an arbitrary label that we use to specify characteristics of a node.
To submit a job with a feature requirement, you can use the --constraint
flag. A comma-separated list requires all listed features, a list delimited by |
requires any of the listed features, and a !
before a feature requires that feature to be absent.
Datacenter
All nodes have either ‘stata’ or ‘holyoke’.
tip: if you are submitting to a shared partition, use --constraint stata
or --constraint holyoke
to guarantee your node is in the same location as your data for best performance
GPU Type
All GPU nodes have a feature corresponding to the GPU type. The currently available list of GPU features is:
geforce_gtx_980
nvidia_a100_80gb_pcie
nvidia_a100-sxm4-80gb
nvidia_geforce_gtx_1080_ti
nvidia_geforce_gtx_780
nvidia_geforce_rtx_2080_ti
nvidia_geforce_rtx_3090
nvidia_geforce_rtx_4090
nvidia_h100_80gb_hbm3
nvidia_h100_nvl
nvidia_rtx_a6000
nvidia_titan_rtx
nvidia_titan_xp
quadro_k620
tesla_v100-sxm2-32gb
Viewing Available Resources
sinfo
Displays available partitions and shows which nodes are idle, in use, or down.
squeue
Displays jobs waiting in the queue.
scontrol show node $NODE_NAME
Displays information about a node, including total resources, allocated resources, and status.
scontrol show partition $PARTITION_NAME
Displays information about a partition, including eligible QoS levels and maximum allowed runtime.
scontrol show job $JOB_ID
Displays information about your running job.