Compute Resources
Compute Resources
When your job runs on a compute node, it can only access the resources it has been assigned. That means you need to explicitly request the number of CPU cores, amount of memory, and number of GPUs you need. You also must specify a time limit, which cannot be greater than the MaxTime setting of the partition you are using.
Time
Specify your job’s maximum run time with the --time=
option. Available formats include D-HH:MM:SS, MM:SS, D-HH, and HH:MM:SS.
To have your job queued more quickly, consider a lower time limit.
Memory
Your job will be limited to the allocated amount of memory. In the simplest case, use --mem
in MB, or specify the unit like --mem=8G
.
You can also specify relative memory values, with --mem-per-cpu
or --mem-per-gpu
CPU
CPU requirements are expressed in number of cores. The simple case is --cpus-per-task
, where the default number of tasks is 1 but more can be specified with –ntasks.
Additional Resource Options
GPU
Request a number of GPUs for your job with --gpus=
.
GPU memory / VRAM is not currently managed by Slurm. To guarantee that a minimum amount of GPU memory is available, choose a GPU type with the --constraint=
option (see below).
Node Features
A “feature” in Slurm is an arbitrary label that we use to specify characteristics of a node.
To submit a job with a feature requirement, you can use the --constraint
flag. A comma-separated list requires all listed features, a list delimited by |
requires any of the listed features, and a !
before a feature requires that feature to be absent.
Datacenter
All nodes have either ‘stata’ or ‘holyoke’.
tip: if you are submitting to a shared partition, use --constraint stata
or --constraint holyoke
to guarantee your node is in the same location as your data for best performance
GPU Type
All GPU nodes have a feature corresponding to the GPU type. The currently available list of GPU features is:
nvidia_a100_80gb_pcie
nvidia_a100-sxm4-80gb
nvidia_geforce_rtx_2080_ti
nvidia_geforce_rtx_3080
nvidia_geforce_rtx_3090
nvidia_geforce_rtx_4090
nvidia_gh200_480gb
nvidia_h100_80gb_hbm3
nvidia_h200
nvidia_l40s
nvidia_rtx_6000_ada_generation
nvidia_rtx_a6000
nvidia_titan_rtx
nvidia_titan_v
nvidia_titan_xp
quadro_rtx_5000
tesla_v100-sxm2-32gb
Viewing Available Resources
sinfo
Displays available partitions and shows which nodes are idle, in use, or down.
squeue
Displays jobs waiting in the queue.
scontrol show node $NODE_NAME
Displays information about a node, including total resources, allocated resources, and status.
scontrol show partition $PARTITION_NAME
Displays information about a partition, including eligible QoS levels and maximum allowed runtime.
scontrol show job $JOB_ID
Displays information about your running job.