Frequently Asked Questions
Important: Do Not Run Interactive IDEs on Login Nodes
Do not run interactive coding editors or AI agents (VSCode, Cursor, Claude, etc.) on the login nodes. These tools consume excessive resources and can cause the cluster to become unusable for all users. Automated scripts are actively running to kill these processes.
For interactive development, allocate a compute node and connect your IDE to it. See: Interactive Coding & IDEs
Monitoring & Managing Your Jobs
How do I check if my job is running?
Use squeue to view your running and queued jobs:
squeue -u $USER
For more detailed information about a specific job:
squeue -j <job_id>
How do I cancel a job?
Use scancel to terminate a running or queued job:
scancel <job_id>
To cancel all of your jobs:
scancel -u $USER
How do I monitor my job’s resource usage?
Use sacct to check resource consumption of completed jobs:
sacct -j <job_id> --format=JobID,JobName,MaxRSS,Elapsed,State
For real-time monitoring while a job is running, SSH into the compute node and use top or nvidia-smi for GPU usage.
How do I check available GPUs on the cluster?
Use sinfo to see available nodes and resources:
sinfo -p <partition_name> --format="%20N %10c %10m %20G %5t"
Or check GPU availability with:
sinfo --format="%20N %10G"
What’s the difference between sbatch and srun?
sbatch: Submit a batch script that runs non-interactively. Your job is queued and runs when resources are available. Output is written to files.srun: Run a command directly in a Slurm-allocated environment. With--pty, it gives you an interactive shell on a compute node.
Use sbatch for most work; use srun for testing commands or interactive development.
Can I run multiple jobs from one script?
Yes, using job arrays. Submit multiple similar jobs with a single sbatch command:
sbatch --array=1-100 my_script.sbatch
Each job will have a unique $SLURM_ARRAY_TASK_ID. See the Slurm documentation for advanced array job syntax.
Software & Tools
Conda
Conda distributions, such as miniconda or anaconda, must be located on either shared storage or temporary local storage on the node. See: Storage on Slurm
Pip
Packages installed with pip are automatically located in temporary local storage on the node. To install packages to shared storage, you must specify the install location with the --target option, and to use these packages you must add that location to your $PYTHONPATH environment variable.
Example:
pip install --target $MY_NFS_PATH <package>
export PYTHONPATH=$$MY_NFS_PATH:$$PYTHONPATH
Alternatively, use Python virtual environments for better organization:
python -m venv $MY_NFS_PATH/my_venv
source $MY_NFS_PATH/my_venv/bin/activate
pip install <package>
Docker Containers
Use Apptainer (formerly Singularity) instead. Docker Engine is not compatible with Slurm, as it would bypass Slurm’s ability to limit a process’s resources.
Apptainer is installed on our compute nodes and is configured to:
- Run as an unprivileged user
- Pass through shared directories such as
/dataand/tmp - Access GPUs with the
--nvflag
Example with GPU:
apptainer run --nv my_container.sif
See the Apptainer documentation and Apptainer Docker compatibility page for detailed information about using Apptainer with Docker containers.
X-Forwarding
X-Forwarding to a compute node requires SSH. See: SSH to Compute Nodes
VSCode / Cursor / Other IDEs
Connecting these tools to a compute node requires SSH. See: Interactive Coding & IDEs for detailed setup instructions.
Data Storage & Transfer
Why can’t I access files in my AFS home directory?
Your AFS tokens can’t pass through to the system where your job is being run, so any files being used when you submit won’t be available to the job (unless the AFS directory is world-readable).
Solution: Copy your files to NFS or temporary local storage before running your job. See: Storage on Slurm
What are the best practices for transferring data to my compute node?
For small to medium files, use sbcast in an sbatch script to transfer files to the /tmp directory of a compute node before running a task:
sbcast $MY_NFS_PATH/input_file.txt /tmp/input_file.txt
For large files or ongoing transfers, use our NFS filesystems, which are available from all nodes and mounted at /data:
rsync -avz $$MY_LOCAL_PATH/ $$MY_NFS_PATH/
For long-term storage, request a dedicated NFS filesystem. See the NFS documentation. If you don’t already have one but want to get started right away, you can create a directory in /data/scratch, but this is not suitable for long-term use.
Do not use the compute nodes for permanent storage. These filesystems are not backed up and may be purged without notice.
Support & Installation
Can you please install ${SOFTWARE} for me?
Possibly. Please send your request to help@csail.mit.edu. Include details about what software you need, what it’s used for, and your preferred installation method if applicable.
In the meantime, consider:
- Installing it yourself to NFS (if you have admin privileges for your use case)
- Using a container (Apptainer) with the software pre-installed
- Using package managers like
condaorpipto install to your own directory


