User account Space
Each user account has a quota of 50GB in the home directory. If you are in need of more space, please use your group space.
Group Space
Each research group has a quota of 10TB in /central/groups/<groupname>. Additional space up to 30TB is available at no charge, while anything above 30TB will be charged at standard storage rates, please have your PI send an email to help-hpc@caltech.edu for information.
Scratch Space
There are two scratch directories available. A 500TB of standard, high speed disk mounted on /central/scratch and a 30TB high IO disk mounted on /central/scratchio. Best practice dictates creating directory for yourself (e.g. /central/scratch/<username>) and working with files from inside of said directory. The /central/scratch partition has a default quota of 20T and 15M files. The /central/scratchio partition has a default quota of 2TB and 3M files. Dependent on the IO properties of your code and the size of your jobs, /central/scratch can be faster in aggregate bandwidth and /central/scratchio faster in file operations per-second. If these's any question on the performance differences between the scratch space options it might make sense to profile your code against both.
The quota can be extended to 50T for 30 days upon request. Please send an email to help-hpc@caltech.edu for information.
These disks are truly meant as scratch space. Any files not accessed in 14 days will be automatically purged. Any method of artificially changing the date/time stamps of a file is strictly prohibited and subject to Caltech's Honor Code.
Checking Quotas for User and Group
To check see how much storage you are using you can use the mmlsquota
mmlsquota -u username --block-size auto central:home
To check for your group storage
mmlsquota -j groupname --block-size auto central
To see how much space each group member is using in your group area, see ...
/central/groups/imss_admin/group_usage/XYZ_usage
... where XYZ is the name of your group. The information is sorted by usage, highest at the top of the list and lowest at the bottom. The usage file gets automatically updated once per day shortly after 4:00am. Lines in the file which have a string of digits instead of a username, followed by a username in square brackets, are reporting usage by group members who are no longer at Caltech. (The name in square brackets after a numeric string is the former user's access.caltech username.)
Snapshots
The GPFS based file system uses snapshot technology that will capture file changes in the following way:
- Every 4 hours for 1 day
- Every day for 1 week
- Every week for 2 weeks
The snapshot directory is not listable, but can be found be changing directory to ".snapshots" .
Backup and Archive
There is no managed BCP/DR style back up nor archival system in place so on the central hpc cluster. Please be sure to migrate any critical data to systems or services outside of the cluster storage on a routine basis. For information on running backups using the Duplicity client see this page. (Duplicity supports saving backups to AWS, Google, Backblaze B2, ssh based hosts and others.)