cat > layouts/partials/flex/body-beforecontent.html << 'EOF'

Storage tiers

Storage Tiers

CSAIL offers three tiers of NFS storage with different performance characteristics; scratch, production, and archival. Some research groups have their own servers, but TIG operates servers in these tiers for shared use by all CSAIL members.

Scratch

Best for: High-performance storage of temporary files and intermediate computations

Key characteristics:

Data management:

SCRATCH STORAGE IS FOR TEMPORARY FILES ONLY.

Files in /data/scratch, /data/scratch-fast, and /data/scratch-oc40 are automatically deleted if not accessed within six months. They may also be deleted sooner if the filesystem fills up.

Do NOT store:

  • Conda environments
  • Python package libraries or virtual environments
  • Software installations
  • Original research data
  • Anything you need to keep permanently

Why Python/Conda breaks in scratch: Python caches bytecode (.pyc files) but cleanup procedures delete plain-text source files (.py) when bytecode is accessed. Your Python environment then fails with cryptic errors because source files are missing.

For permanent storage, use production or archival tiers (see below).

Production

Best for: Main workhorse for research computing and data analysis

Key characteristics:

Quotas and organization:

Data protection:

Archival

Best for: Read-only reference data and data not actively updated

Key characteristics:

Data protection:


Data Center Locations

All three storage tiers are split between:

When performance matters, ensure you use servers in the same building as your data. The Holyoke data center has an annual 24-hour maintenance shutdown (usually in June) during which all servers and data there are inaccessible. The community is notified a few months in advance.


Storage Tiers Comparison

Feature Scratch Production Archival
Use case Temporary, intermediate files Active research data Read-only reference data
Snapshots None Yes Yes
Backups None Daily Weekly
Quota 1 TiB per user Per filesystem Per filesystem
Performance High High Medium (optimized for reliability)
File cleanup After six months of non-access Never Never
Cost Lowest Higher Medium