Snapshots and backups
NFS Snapshots and Backups
The scratch storage tier does not receive
either snapshots or backups.
Presently, scratch storage is on the servers
You can determine which server your filesystem is on by using the
The backing store for the NFS servers uses the ZFS filesystem, which
follows a non-overwriting (copy-on-write) storage policy. Whenever any
block is updated, whether it contains data or metadata, a fresh block is
allocated from the storage pool and pointers to the old block are
updated to point to the new block, recursively all the way up to the
root of the filesystem. (Blocks are also checksummed, so the filesystem
forms a Merkle tree structure.) This structure makes taking snapshots
extremely simple and computationally inexpensive, so we take them fairly
frequently and keep them for a good length of time. Disk blocks
assigned to snapshots are counted against your filesystem’s quota, so
if you are overwriting or deleting large amounts of data, you may notice
that your filesystem appears to be shrinking. ZFS reports the size of
the filesystem to
df and NFS clients as the filesystem’s quota minus
the space occupied by snapshots. If you completely fill your quota, you
will be unable to delete anything until one of two things happens:
either sufficient snapshots are destroyed automatically, or TIG deletes
them manually at the request of the filesystem’s owner. Automatic
snapshots are mandatory on CSAIL NFS.
If you delete or overwrite a file soon after it was initially created, chances are pretty good that the disk blocks can be freed immediately. However, if the file exists when a snapshot is taken, its storage will be reserved for the lifetime of that snapshot. The following table shows when snapshots are taken and how long they last:
|Frequency||When taken||How many kept|
|hourly||close to the top of the hour||25|
|weekly||4:15 AM on Saturdays||5|
|monthly||5:30 AM on the first||2|
Thus, if you create a file at noon and delete it at 2 PM, it will still be in a snapshot until 3 PM the following day. If you create a file at noon on a Monday and delete it Tuesday, it will stick around until Wednesday the following week. If you create a file on a Friday and don’t delete it until Saturday afternoon, it will take up disk space for another five weeks, unless Saturday was the first day of the month, in which case it won’t be freed until the first day of the month after next.
The snapshots for any filesystem can be accessed through the hidden
.zfs/snapshot in the root directory of each filesystem.
A snapshot can also be made available as a read-write
clone as the need arises; contact help@csail to request this. Likewise
if you have run out of space and need to request that your old snapshots
be deleted. These requests should come from the PI, group system
administrator, or the individual who owns the top-level directory of the
filesystem. Note that only snapshots older than the last monthly
snapshot can be deleted. Generally we prefer to increase quotas
instead of deleting snapshots, when resource constraints permit.
Snapshots are a data preservation mechanism orthogonal to backups. Currently, backups are taken of live filesystems, but we expect in the future to take backups only of snapshots (and on a different server). Primarily snapshots are there to allow us to respond to restore requests quickly, and in some circumstances they allow us to restore files before they make it into the backup system. However, tape backup remains our primary mechanism for long-term data security and disaster recovery.
When a new NFS filesystem is created, the owner can request one of three different backup policies: 1 No backups necessary (for long-term data only; transient data should go to scratch space) 2 Infrequent backups for rarely-written or archival data 3 Daily backups for active projects and workspaces
If the shared scratch space is inadequate for a group’s needs,
individual scratch filesystems (on a scratch server) are available on
request if sufficient uncommitted storage is available. Data in class 2
will be mounted under the
/archive tree in CSAILified Linux systems,
to remind users that the data is not intended for frequent updates. It
is also stored with a stronger checksum (SHA256) to make it more likely
that any data corruption will be detected and repaired.
Backups cannot begin on a new filesystem until a significant amount of data has been written. How much is “significant” depends on how the data in the filesystem is being used, and in particular how much of the initial data is expected to change after it is first written. If too much of the initial data changes or is deleted, backups will automatically cease and require administrator intervention to resume (usually after taking a new initial backup). We recommend waiting until at least 40% of your quota is used, but this is not a hard requirement and backups can be started at any time so long as some permanent data is being stored.