cat > layouts/partials/flex/body-beforecontent.html << 'EOF'

Frequently Asked Questions

Important: Do Not Run Interactive IDEs on Login Nodes

Do not run interactive coding editors or AI agents (VSCode, Cursor, Claude, etc.) on the login nodes. These tools consume excessive resources and can cause the cluster to become unusable for all users. Automated scripts are actively running to kill these processes.

For interactive development, allocate a compute node and connect your IDE to it. See: Interactive Coding & IDEs

Monitoring & Managing Your Jobs

How do I check if my job is running?

Use squeue to view your running and queued jobs:

squeue -u $USER

For more detailed information about a specific job:

squeue -j <job_id>

How do I cancel a job?

Use scancel to terminate a running or queued job:

scancel <job_id>

To cancel all of your jobs:

scancel -u $USER

How do I monitor my job’s resource usage?

Use sacct to check resource consumption of completed jobs:

sacct -j <job_id> --format=JobID,JobName,MaxRSS,Elapsed,State

For real-time monitoring while a job is running, SSH into the compute node and use top or nvidia-smi for GPU usage.

How do I check available GPUs on the cluster?

Use sinfo to see available nodes and resources:

sinfo -p <partition_name> --format="%20N %10c %10m %20G %5t"

Or check GPU availability with:

sinfo --format="%20N %10G"

What’s the difference between sbatch and srun?

Use sbatch for most work; use srun for testing commands or interactive development.

Can I run multiple jobs from one script?

Yes, using job arrays. Submit multiple similar jobs with a single sbatch command:

sbatch --array=1-100 my_script.sbatch

Each job will have a unique $SLURM_ARRAY_TASK_ID. See the Slurm documentation for advanced array job syntax.

Software & Tools

Conda

Conda distributions, such as miniconda or anaconda, must be located on either shared storage or temporary local storage on the node. See: Storage on Slurm

Pip

Packages installed with pip are automatically located in temporary local storage on the node. To install packages to shared storage, you must specify the install location with the --target option, and to use these packages you must add that location to your $PYTHONPATH environment variable.

Example:

pip install --target $MY_NFS_PATH <package>
export PYTHONPATH=$$MY_NFS_PATH:$$PYTHONPATH

Alternatively, use Python virtual environments for better organization:

python -m venv $MY_NFS_PATH/my_venv
source $MY_NFS_PATH/my_venv/bin/activate
pip install <package>

Docker Containers

Use Apptainer (formerly Singularity) instead. Docker Engine is not compatible with Slurm, as it would bypass Slurm’s ability to limit a process’s resources.

Apptainer is installed on our compute nodes and is configured to:

Example with GPU:

apptainer run --nv my_container.sif

See the Apptainer documentation and Apptainer Docker compatibility page for detailed information about using Apptainer with Docker containers.

X-Forwarding

X-Forwarding to a compute node requires SSH. See: SSH to Compute Nodes

VSCode / Cursor / Other IDEs

Connecting these tools to a compute node requires SSH. See: Interactive Coding & IDEs for detailed setup instructions.

Data Storage & Transfer

Why can’t I access files in my AFS home directory?

Your AFS tokens can’t pass through to the system where your job is being run, so any files being used when you submit won’t be available to the job (unless the AFS directory is world-readable).

Solution: Copy your files to NFS or temporary local storage before running your job. See: Storage on Slurm

What are the best practices for transferring data to my compute node?

For small to medium files, use sbcast in an sbatch script to transfer files to the /tmp directory of a compute node before running a task:

sbcast $MY_NFS_PATH/input_file.txt /tmp/input_file.txt

For large files or ongoing transfers, use our NFS filesystems, which are available from all nodes and mounted at /data:

rsync -avz $$MY_LOCAL_PATH/ $$MY_NFS_PATH/

For long-term storage, request a dedicated NFS filesystem. See the NFS documentation. If you don’t already have one but want to get started right away, you can create a directory in /data/scratch, but this is not suitable for long-term use.

Do not use the compute nodes for permanent storage. These filesystems are not backed up and may be purged without notice.

Support & Installation

Can you please install ${SOFTWARE} for me?

Possibly. Please send your request to help@csail.mit.edu. Include details about what software you need, what it’s used for, and your preferred installation method if applicable.

In the meantime, consider: