Frequently Asked Questions
How do I use the SLURM cluster?
With your CSAIL Kerberos name and password, ssh to tig-slurm.csail.mit.edu. From there, use sbatch to submit your job to the queue. For more details, see the quick-start guide.
Why is my home directory empty?
Your AFS tokens can’t pass through to the system where your job is being run, so any files being used when you submit won’t be available to the job.
Do not use your home directory on the slurm controller or any of the worker nodes for permanent storage. The space here is limited and will be occasionally purged without notice.
If I can’t use my home directory, where do I store things?
In our shared NFS storage, which is mounted at /data. Anyone can create a directory on /data/scratch, but we don’t guarantee data stored here won’t disappear. If you need something more stable, refer to our NFS documentation to request a filesystem.
Can you please install ${SOFTWARE} for me?
No, probably not. Our cluster users often require specific versions of various libraries, and to prevent conflicts we keep cluster-wide software installation to a minimum.
How do I install my own software, then?
The container engine Singularity is available on our cluster systems. You can install any software you like within a container, and run it as an unprivileged user. Please refer to the upstream documentation at www.sylabs.io.
In addition to Singularity, we provide default vendor-packaged installations of python-pip, python3-pip, and npm.
pip, npm, and singularity all make use of your home directory by default. It is possible to change this with command line options, however it may be simpler to change your home directory to something within NFS by setting the $HOME environment variable. Your choice of software may require additional settings to correctly use NFS folders.
The following example uses a personal NFS scratch directory to run a singularity container:
HOME=/data/scratch/$USER; export SINGULARITY_CACHEDIR=$HOME/.singularity; export SINGULARITY_TMPDIR=/run/user/$UID; export SINGULARITY_LOCALCACHEDIR=/run/user/$UID; srun singularity exec docker://python:latest /usr/local/bin/python -c "print('Hello, world.')"
You are also welcome to download or compile software into an NFS directory, and run it from there as an unprivileged user.
Why did my job suddenly exit?
Your job might have hit its time limit: our default maximum run-time is one day. You can use –time to specify a longer limit, with a maximum of four days. Please keep in mind that we have limited resources and have to share.
Your job might have run out of memory: our default memory allocation is 2000MB per allocated CPU. You can user –mem-per-cpu to request more.
Your job might have attempted to access an unavailable resource, such as a file in your home directory, or it may have failed for another, unknown reason. If you’re unsure, feel free to contact help@csail.mit.edu for troubleshooting assistance.