As a user, you should be familiar with various volumes on the cluster. The first three (3) volumes below are extreamly important for you to understand their purpose. This will help you manage your files and data effectively.
- /home/<username> is the user’s home directory and should only be used to store small files such as configuration files, documents, and source code [keep this to a maximum of 300GB]. (ZFS Snapshots, ZFS replication, and nightly backup to tape archive using TiBS.)
- /user_data/<username> space is a 1TB partition for users to save their work. (ZFS Snapshots, ZFS replication, and nightly backed up by School of Computer Science Facilities to tape archive using TiBS.)
- /lab_data/<labgroupname> is a 10TB partition for lab members to share and save their work.(ZFZ Snapshots and ZFS replication)
- /containers – some standard singularity images that we provide to our users can be found here.
The storage on the cluster was put in place so users could utilized enterprise grade storage for their computational needs. The cluster is not intended to be used for a file and/or backup server. Please do not store your processed data on the cluster. Only data currently being used for processing and current results of cluster processing should reside on the cluster.
Data protection – ZFS snapshots, ZFS replication and Tape backup
DATA LOSS: We care about your data, and we do everything we can to retain and save any and all data whenever possible. We will not be held legally liable for any data loss. We do this as a courtesy to our users, but we offer no guarantees.
The data protection in place on the cluster is designed to minimize downtime and data loss. Most of this is done without the user really noticing.
If you need your files restored, you should send David an email with the following information:
- Details on what files/directories need recovered and include full paths if possible.
- Date and which you would like the files to be recovered from.
- The location you would like the files to be recover to
In summary, besides for the redundancy we have configured in the RAID on our volumes, we also have the following data protection:
- Tape backup is done nightly. This servers as an archival history of the files, but could be slow to restore.
- ZFS snapshot is a feature in the ZFS file system which a point-in-time copy of the file system. This servers as a way to immediately recover accidentally deleted or corrupt files without having to go to tape to recover files.
- ZFS replication duplicates the snapshots on the primary storage to a secondary storage device. This servers as a disaster recovery platform for catastrophic events.
ZFS snapshots and your quota.
Have you received a “Disk Quota Exceed” message, delete some files and that failed to resolve the quota problem?
When a snapshot is created, the space is initially shared between the snapshot and the file system, and could also be shared with previous snapshots. As the files in the file system changes due to files being updated or deleted, some of the space in the snapshots becomes unique to the snapshot.
Space deleted by users isn’t immediately freed because the files deleted are still taking up space in the previous snapshot(s). What is confusing to users is that du, ls and other standard UNIX commands show the files were deleted and they are below their quota, when actually their space is still being used due to the snapshots. To resolve this issue, you will have to send an email request to David asking him to delete the snapshots. Keep in mind that this will eliminate the ZFS snapshot option for file recovery for any files contained in the deleted snapshots.