NFS Snapshots

The scratch storage tier does not receive either snapshots or backups. Presently, scratch storage is on the servers nfs-prod-12 and nfs-prod-13. You can determine which server your filesystem is on by using the df command.

Introduction

The backing store for the NFS servers uses the ZFS filesystem, which follows a non-overwriting (copy-on-write) storage policy. Whenever any block is updated, whether it contains data or metadata, a fresh block is allocated from the storage pool and pointers to the old block are updated to point to the new block, recursively all the way up to the root of the filesystem. (Blocks are also checksummed, so the filesystem forms a Merkle tree structure.)

This makes it trivial to capture a point-in-time snapshot of any filesystem, just by creating an additional reference to the root of the tree, and our practices for monitoring and managing NFS servers depend on the automatic, periodic creation and deletion of snapshots. (In particular, snapshots are required for filesystem migrations between servers, and we depend on them as well for capacity planning.) Snapshots are mandatory except on “scratch” filesystems.

Disk blocks assigned to snapshots are counted against your filesystem’s quota, so if you are overwriting or deleting large amounts of data, you may notice that your filesystem appears to be shrinking. ZFS reports the size of the filesystem to df and NFS clients as the filesystem’s quota minus the space occupied by snapshots. If you completely fill your quota, you will be unable to delete anything until one of two things happens: either sufficient snapshots are destroyed automatically, or TIG deletes them manually at the request of the filesystem’s owner.

Policy

If you delete or overwrite a file soon after it was initially created, chances are pretty good that the disk blocks can be freed immediately. However, if the file exists when a snapshot is taken, its storage will be reserved for the lifetime of that snapshot. The following table shows when snapshots are taken and how long they last:

Frequency When taken How many kept
hourly soon after the top of the hour 25
daily 3:01 AM 8
weekly 4:15 AM on Saturdays 5
monthly 5:30 AM on the first 2

Thus, if you create a file at noon and delete it at 2 PM, it will still be in a snapshot until 3 PM the following day. If you create a file at noon on a Monday and delete it Tuesday, it will stick around until Wednesday the following week. If you create a file on a Friday and don’t delete it until Saturday afternoon, it will take up disk space for another five weeks, unless Saturday was the first day of the month, in which case it won’t be freed until the first day of the month after next.

Accessing snapshots

The snapshots for any filesystem can be accessed through the hidden directory .zfs/snapshot in the root directory of each filesystem. A snapshot can also be made available as a read-write clone as the need arises; contact help@csail to request this.

Premature removal of snapshots

If you have run out of space and need to request that your old snapshots be deleted, send a request to help@csail, making sure to include the name of the filesystem. These requests should normally come from the PI, group system administrator, or the person who originally requested the filesystem. Note that only snapshots older than the last monthly snapshot will normally be deleted. (Sometimes more recent snapshots may be deleted, such as on a brand-new filesystem that has never had a monthly snapshot, or an hourly snapshot that would be deleted automatically in a few hours anyway.) Generally we prefer to increase quotas instead of deleting snapshots, when resource constraints permit.