Navigation :

Requests for new NFS filesystems

When requesting a new NFS filesystem, please provide the following information in an email to help@csail.mit.edu. NOTE WELL: The information you include in your request is a contract between you and TIG. Do not expect sympathy if you use a filesystem for something other than what you told us and run into problems. (We will of course still help you, but that may mean that we help you to copy your data to a more appropriate place.)

What will the filesystem be used for? This and the following question will determine the name for your new filesystem. Please note: “my files” is not an adequate explanation of what your filesystem will be used for. Most people work on multiple projects during their time at CSAIL, and most projects will have multiple people working on them; these need to be stored in separate filesystems. A good answer to this question will be something like “storing raw frog genome sequences” or “accumulating simulation results for Prof. X’s new cache coherence protocol” or “reference data from the FROBNITZ consortium that I’m using to train my speech recognizer”.
Who (what group or project) will be maintaining it? (We want the name of a Unix/AFS filesystem group here, not a list of users, but if there is someone who will be the principal administrator, please let us know that too.)
How long will this data need to remain online, and do your funding agencies have any requirements for data stewardship or preservation? Please be as specific as possible. We will not delete anything without checking with your group first; this just gives us an idea of when it’s reasonable to ask. (If you do not provide an expicit expiration date, we will assume five years from initial creation.)
What is the access pattern? Will it be write-once, read-many, or will it be simultaneous reads and updates? Please describe in detail how you will be using the data including which compute nodes will be accessing it. For example, if your data will be read sequentially, so that each file is read only once per job and then not used again until the next job starts, that is important to know. Contrariwise, if small segments of files will be read and updated at random, we need to know that. Make sure to include information about how parallel your use is likely to be: if you will be running a 64-node cluster and every node updates the same file, we may need to configure things differently.
How much space is required? How much do you need now, and how much will it grow over time? Keep in mind that all NFS filesystems have snapshots, which will take up some of your allocated space.
Do you need this data to be backed up, or can you reload/regenerate it from some other source if necessary?
Is your data stored in a compressed or encrypted form?
Images and videos are almost always compressed.