NFS FAQ
Frequently Asked Questions
Access
How do I access /data
?
CSAIL NFS is only supported on CSAIL Ubuntu clients.
The /data
filesystem is synthesized inside the NFS automounter based
on configuration files installed by CSAIL Ubuntu’s configuration
management.
When properly set up, the df
command will show:
$ df /data
Filesystem 1K-blocks Used Available Use% Mounted on
/etc/auto.d/auto.data 0 0 0 - /data
If the automounter is not installed, configured, and running, that usually means that the machine hasn’t been set up for NFS access. To fix this, run the following commands as root:
# echo 'autofs=yes' >> /etc/facter/facts.d/csail.txt
# puppet agent -t
/data
is mounted but I don’t see anything!
The automounter synthesizes directory entries as and when they are
accessed.
On first access to a filesystem, the automounter will use its
configuration files (stored in /etc/auto.d
) to locate the correct
server and path, create a mount point, and mount the remote
filesystem.
Before a filesystem has been mounted, it will not show in directory
listings, but there will be a short delay on first access.
Thus, you should not depend on browsing to find your filesystem —
always go directly to the full path you have been given.
The CSAIL automount configuration uses a hierarchical design, so
filesystems will be divided by function and research group.
The df
command will tell you whether a particular
directory is being synthesized by the automounter or represents a real
directory on an NFS server.
Can I access my CSAIL NFS from my laptop/desktop/custom server?
CSAIL NFS is only supported on CSAIL Ubuntu. The NFS protocol is standard, and we do not employ technical measures to prevent anyone from mounting the CSAIL servers on an unsupported platform, but we cannot assist you if things go wrong. Furthermore, we will sometimes move filesystems from server to server without notice; clients not using CSAIL Ubuntu with up-to-date CSAIL configuration management will not receive updated configuration files and thus will have the filesystem ripped out from under them. Finally, unmanaged clients like laptops will not have the correct mapping of user and group IDs, so you will not able to access files as yourself and will be able to create files that you cannot access or remove from another computer.
Why am I suddenly getting Read-only file system
errors?
Usually this happens as a part of filesystem migration from one server
to another.
A client receiving this error is still mounting the filesystem from
the old server, usually because some process was holding the
filesystem mounted when the transition happened, so the automounter
could not unmount the old server.
We try keep the old copy online, in read-only mode, for a week or so
after completing a move in case there are clients stuck in this state.
Use sudo lsof /data/mygroup/mountpoint
to identify the processes
which are hanging on to the old mount point and then terminate them so
the automounter can do its job.
(Sometimes if things get really stuck you may need to sudo umount /data/mygroup/mountpoint
after terminating the remaining processes.)
As an alternative, you can reboot the client node.
Capacity
Why is the size of my disk shrinking?
The CSAIL NFS servers implement snapshots.
The quota for your filesystem counts both current data and old data
referenced by snapshots, but NFS does not provide a way to convey this
information to clients, so the space used by snapshots is subtracted from
your quota when reported in df
.
In theory, we could define quotas another way (refquota
and
refreservation
rather than quota
and reservation
, for those
familiar with ZFS),
but this would require us to overcommit storage, which we do not want
to do.
(A few filesystems on group-owned servers are configured this way,
since the whole of the storage is allocated to the same group.)
How much space can I get?
There are four distinct answers to this question.
- Individual users of shared “scratch” filesystems are limited to 1 TiB per filesystem. (Group “scratch” filesystems may have other quotas or no per-user quotas at all, at the discretion of the PI.)
- Individual filesystems should not exceed 25 TiB before compression if they are to be backed up.
- The total of a research group’s NFS allocation should not exceed 65 TiB of shared storage. (This limit is subject to review as technology improves.)
- Group-owned storage is unlimited, but if the group wants TIG to manage the file server, take backups, and handle drive replacements and migrations, you must buy the equipment that we specify.
All of our filesystems are configured with compression enabled, so
depending on the format and redundancy of your data you may be able to
store substantially more.
Quotas are based on physical storage consumed and not the uncompressed
size of the data.
(The du
utility by default shows the actual disk space occupied.
Use du --apparent-size
to find the uncompressed size.)
I’m curious, how much storage does CSAIL have?
It’s a challenge to measure this accurately for reasons related to RAID overhead, and any static web page (like this one) will inevitably be out of date. In addition, there are multiple tiers of storage and much of the storage TIG manages is actually owned by research groups rather than being a lab-wide shared resource. However, as of 2022-02-15, the current raw amount of shared storage in the “production” tier is 1567 TiB, spread over three servers (two in Stata, one in MGHPCC).
Security
Can I access NFS from outside CSAIL?
No. NFS does not have any meaningful security mechanism, so to the extent we can provide any security at all, it depends on restricting network access to the NFS servers. (Arguably, even restricting it to the CSAIL network is too open: anyone who controls a machine on an “inside” network can read, write, create or delete any files on any CSAIL NFS server. CSAIL originally started with only AFS as a result of a security incident in the then AI Lab wherein an attacker on an AI workstation methodically began deleting all of the files on the AI’s NFS server. There is an authentication protocol that we could configure, but you would have to have Kerberos tickets to authenticate to NFS as well as AFS, and our users have consistently told us that they care more about frictionless access than they do about security. See below for more information.)
Some people have successfully used sshfs
to access CSAIL NFS over an
SSH connection to a server on the CSAIL network.
This will not perform especially well but is a reasonable option for
casual access, software development, and other low-volume, low-speed
interactive tasks.
Can I store NDA-protected data on CSAIL NFS?
Yes and no, but mostly no.
Because NFS lacks meaningful security, there is no way to protect data
stored in CSAIL NFS from unauthorized access.
You can store such data in encrypted form (we recommend
age
) so long as your encryption keys
are not stored on NFS (e.g., on local disk on a compute
node and readable only by authorized users), but for the lowest
friction, we recommend that you provision your own file server on a
private network — see Computing in Secured Medium Risk
Environments.
Note that this represents a substantial hardware expense that you should build
into your grant proposals when undertaking research with this sort of
data.
Such servers are not integrated into the CSAIL NFS environment.
(Depending on your specific requirements, CSAIL AFS may provide a sufficient level of security when combined with additional access controls, albeit at a lower level of performance.)
How does NFS access control actually work?
The NFS server trusts clients’ assertions about who they are. Every request simply states a user ID number and a set of group ID numbers, and the server uses those values to perform access control. This means that anyone who has superuser access on any machine on the CSAIL network can potentially read, modify, create, and delete files assuming the identity of any other user.
Furthermore, NFS traffic is sent in the clear, without integrity protection, over the network. Thus, any attacker in the middle can also alter legitimate NFS requests and responses.
There are available mitigations for both of these flaws, but they would require making all existing clients and servers obsolete, and in some cases would require substantially altering the NFS service offering by requring users to maintain valid Kerberos tickets on every client they use (making it more like AFS, and incompatible with batch computing like SLURM).