skip to content
 

Details of Maths File Backups and Archiving

All home directories are backed up more than once per day. Some files in home directories are not backed up at all, such as all files in the NOBACKUP directory (see below for the full list). We do not back up /scratch/ or /data/ files or those held in the various /tmp/ directories etc.

Our regular backups are known as snapshots i.e. multiple copies of the home directory trees (sharing data where possible). We currently have allocated about 5TB disk space each for the snapshots of DAMTP, Pure Maths and Statslab home directories. This snapshot facility is also used to hold copies of important system files to aid in disaster recovery.

The backups are stored on the disk of the backup server "balmoral" which is in a different pavilion from the home directory server.

The primary reason for keeping these backups is so that data can be recovered in the event of a disaster or catastrophic event (e.g. several disks failing simultaneously or a fire). That said we can often recover accidentally deleted or damaged files. Email help@maths.

Backup Exclusions

Currently for most file-systems we exclude from the snapshots files which match the following patterns:

**/.adobe/Acrobat/*/Cache
**/.adobe/Acrobat/*/Temp
**/.adobe/Flash_Player/AssetCache
.cache
.dropbox
Dropbox
**/evolution/cache
**/evolution/mail/imap
.fontconfig
.fonts.cache*
global-messages-db.sqlite*
**/.googleearth/Cache
**/.googleearth/Temp
**/.java/deployment/cache
.jpi_cache
**/.kde/share/cache
**/Library/Caches*
**/Library/Logs
**/Library/Mail/IMAP*@*
**/Library/Preferences/*Cache
**/.local/share/Trash
**/.macromedia/**/#SharedObjects
**/.Mathematica*/FrontEnd/*_Caches
**/.mcop/trader-cache
.mcrCache*
**/.mozilla/**/Cache*
**/.netscape/*cache*
NOBACKUP
**/.openoffice.org**/user/registry/cache
**/.openoffice.org**/user/uno_packages/cache
**/.opera/cache*
.Spotlight-V100
*.sqlite-journal
**/.TeXmacs/system/cache
.thumbnails
**/.thunderbird/**/*Cache*
**/Thunderbird/**/*Cache*
**/.thunderbird/**/panacea.dat
**/Thunderbird/**/panacea.dat
.Trash
urlclassifier.pset
urlclassifier*.sqlite
XPC.mfasl
XUL.mfasl

These are rsync patterns so XUL.mfasl as a leafname is excluded anywhere while **/.mozilla/**/Cache* excludes anything called Cache* at least 2 levels below a directory called .mozilla/ (in any directory).

The excludes are intended to avoid us needing to back up files which are not considered important, e.g. those which are simply caches or forms of content which can easily be regenerated if lost - e.g. the various IMAP directories listed are caches of indexing material (etc) held on the relevant remote imap server.

If you believe that we are accidentally excluding material which is actually valuable please let us know.

See the rsync manual if you want to see why ** is used not *.

Archive

Backup archives are kept going back at least 3 months for home directories. Therefore if something is accidentally deleted during the last 3 months we should be able to retrieve it for you.

What about big data files?

Large data files should be stored in store or scratch spaces. Store spaces have an optional 2 weeks of limited backups. Scratch spaces are not backed up.

So how many copies do we keep?

We arrange to make 'snapshots' of the contents of home directories roughly once every 11-12 hours (more frequently for secretarial directories). The snapshots are held on a server in a different location than the main server (for safety), and are pruned as they get older so after a few days we only store one per day, then one per week, one per 30 days and then one per 90 days.

The actual number we can afford to keep may change but currently for most users we are keeping about 101 snapshots arranged as:

  45x11h, 30x26h, 22x7d, 3x30d, 1x90d

so after ~20 days of roughly 11-hourly snapshots we then keep copies spaced at about 26 hours for another 30 days and then ones spaced at 7 day intervals for another 22 weeks etc.

Some of my files don't need backing up

If you know that a file or files don't need backing up you can signal to the backup systems that this is the case. Doing this speeds up the backups of important files and reduces the possibilities that the backups will not have enough space to hold much more important files (e.g. your papers!).

As can be seen from the exclude patterns above any directory called NOBACKUP is ignored by the snapshot backups. All files or directories under it are not backed up by either the main or incremental backups.

Thus if you have a set of files which can be re-constructed by running code or by fetching from the original site or you have backed up yourself on a known safe media, you can arrange for them to be stored under a NOBACKUP directory.

Clearly you should not do this for any file which would be a problem if it is lost.