Rules and recommendations

NFS Rules and recommendations

In order to make the best use of the new NFS servers, we are asking people to adhere to a few rules:

  1. Users of NFS storage are expected to join the [nfs-users@csail] mailing-list to receive notification of system outages.
  2. NFS storage is a costly resource to maintain. Do not use it for backups. If your research does not absolutely require the capacity or performance of NFS, keep your data in AFS, which is much cheaper to maintain.
  3. Please request one new filesystem per dataset or per project. Filesystems are cheap, and keeping different kinds of data separate has numerous advantages:
    • It allows our backup system to operate more efficiently.
    • It allows us to set storage parameters appropriately for the type of data being stored.
    • It makes it easier to identify and resolve performance problems relating to filesystem activity.
  4. Please don’t mix different kinds of data in one filesystem. For example, if you’re working on video analysis, DO NOT take a multi-gigabyte video file and explode it into a million individual frames in the same filesystem – use a separate scratch filesystem for that. (Feel free to ask for one if you need it, after checking within your group.)
  5. Also, please don’t mix data with radically different access patterns. If you have a large dataset that you use as reference material – that is, you extract it once but never change it – we want to put it in a different filesystem with different storage parameters and backup policy. (We can even make it read-only to ensure that you don’t change it by accident.)
  6. Creating hundreds of millions of files is almost never a good idea. Under no circumstances should you create even hundreds of thousands of entries in the same directory – this causes all accesses to that directory to be much slower, and may prevent the backup system from ever finishing a backup of your data. Use multiple levels of subdirectories to limit the size of any one directory, and keep large numbers of small files in archives (tar or ZIP) when you do not need to access them immediately. (For many cluster applications, it may make sense to store all your data in an archive, and extract to temporary storage only those files required for the computation on each node.) Please ask TIG if you need assistance in structuring your data.