Compute Resources

Compute Resources

When your job runs on a compute node, it can only access the resources it has been assigned. That means you need to explicitly request the number of CPU cores, amount of memory, and number of GPUs you need. You also must specify a time limit, which cannot be greater than the MaxTime setting of the partition you are using.

Time

Specify your job’s maximum run time with the --time= option. Available formats include D-HH:MM:SS, MM:SS, D-HH, and HH:MM:SS.

To have your job queued more quickly, consider a lower time limit.

Memory

Your job will be limited to the allocated amount of memory. In the simplest case, use --mem in MB, or specify the unit like --mem=8G. You can also specify relative memory values, with --mem-per-cpu or --mem-per-gpu

CPU

CPU requirements are expressed in number of cores. The simple case is --cpus-per-task, where the default number of tasks is 1 but more can be specified with –ntasks.

Additional Resource Options

GPU

Request a number of GPUs for your job with --gpus=.

GPU memory / VRAM is not currently managed by Slurm. To guarantee that a minimum amount of GPU memory is available, choose a GPU type with the --constraint= option (see below).

Node Features

A “feature” in Slurm is an arbitrary label that we use to specify characteristics of a node. To submit a job with a feature requirement, you can use the --constraint flag. A comma-separated list requires all listed features, a list delimited by | requires any of the listed features, and a ! before a feature requires that feature to be absent.

Datacenter

All nodes have either ‘stata’ or ‘holyoke’. tip: if you are submitting to a shared partition, use --constraint stata or --constraint holyoke to guarantee your node is in the same location as your data for best performance

GPU Type

All GPU nodes have a feature corresponding to the GPU type. The currently available list of GPU features is:

nvidia_a100_80gb_pcie
nvidia_a100-sxm4-80gb
nvidia_geforce_rtx_2080_ti
nvidia_geforce_rtx_3080
nvidia_geforce_rtx_3090
nvidia_geforce_rtx_4090
nvidia_gh200_480gb
nvidia_h100_80gb_hbm3
nvidia_h200
nvidia_l40s
nvidia_rtx_6000_ada_generation
nvidia_rtx_a6000
nvidia_titan_rtx
nvidia_titan_v
nvidia_titan_xp
quadro_rtx_5000
tesla_v100-sxm2-32gb

Viewing Available Resources

sinfo

Displays available partitions and shows which nodes are idle, in use, or down.

squeue

Displays jobs waiting in the queue.

scontrol show node $NODE_NAME

Displays information about a node, including total resources, allocated resources, and status.

scontrol show partition $PARTITION_NAME

Displays information about a partition, including eligible QoS levels and maximum allowed runtime.

scontrol show job $JOB_ID

Displays information about your running job.