You can use the following file systems:
|Size||3 PByte||>100 GByte per node|
|Access time||fast (dependent on overall file system traffic)||very fast, low latency|
|Files accessible||global for all nodes||local disks of the compute node, file is not accessible from other nodes|
|Persistence||permanent||during the project's validity term||after 8 weeks, files will be deleted unconditionally and without further notice||only while the job is running|
|Quota||15 GByte, more on request||on request only||10 TByte or 2Mio. files, more on request||none|
Snapshots (see below) +|
daily tape backup (for desaster recovery only)
static input data, results of finished jobs, low-volume I/O|
Do not use home, groups or projects for running jobs!
|running jobs' input/output, intermediary files (CPR), high-volume I/O||node-local job data, intermediary files (non-CPR), high-volume I/O|
Since the migration to the new storage system in October 2019, the global file systems do not differ in throughput or latency any longer.
However, due to snapshot and backup considerations: Do not use
projects for I/O of running jobs!
The home directory should be used for all files that are important and need to be stored permanently. Every user can only store a small amount of data here: default quota is currently 15 GBytes. In well reasoned cases and on request, this quota can be increased. The folder /home/$USER (“Home”) is created with each user account. It is accessible by the environmental variable
Available on request, groups (institutes) can get a group folder, to share static input data and common software (versions) for their members and coworkers.
Likewise, projects with more than a few members can request a projects folder for the same purposes.
For these low-volume I/O classes of folders, our file system automatically creates periodic snapshots, allowing you to access (and restore) older versions of your files without assistance by the admins. Snapshots are saved to the hidden folder
.snapshots (you would not see this folder listed even by an 'ls -la'). Nonetheless, you can go to that hidden folder by explicitely “
cd .snapshots” (<TAB>-completion does not work either, you have to fully type
.snapshots). Being in
.snapshots/, you can do 'ls -l' and 'cd' as usual, and access former versions (or states) of all your data (within the
Files from the snapshot folder still occupy storage space and thus, do affect your quota! Therefore, it is possible your home folder's quota is exceeded, even though the '
df' command still shows less usage.
Snapshots cannot be deleted (deleting data creates copies of the snapshot).
Frequent saving and deleting files fills up the snapshot area and requires space at the containing folder. If possible, this should thus be avoided (so do not use home, groups or projects folders for high-volume I/O eg. for I/O of running jobs!).
In urgent cases, the snapshot folder can be deleted by the administrators.
In addition to the snapshots, we do periodic tape backups (currently weekly) of the above folders, but this is for disaster recovery only. Recovery of individual user files is not possible from these tape backups.
Here, almost unlimited disk space is available for all users, but only for a limited time: After 8 weeks the files will be deleted unconditionally without further notice.
The files on
/work/scratch/ are not backed up by any means.
Standard quota is currently 10 TByte or 2 million files. In well reasoned cases and on request, this quota can be increased.
/work/scratch/$USER (“scratch”) is created with each user account. It is accessible by the environmental variable
The local disks at the individual compute nodes are mounted at “
/node” and are to be used during an individual job's calculation. Due to the low latency of the node-local disks, intermediary files can be stored there quite efficiently.
When a job is assigned a certain node and is started there, the folders
/node/$SLURM_JOBID/tmp are created on that assigned node. For convenience, two corresponding environment variables
$TMP are set, which you can use in your jobscripts instead of the longer “
At the end of the job, these subdirectories and their content will be deleted automatically. It is thus imperative to save any final results from
$HPC_LOCAL/ before the end of the job to the
scratch file system! Likewise,
$HPC_LOCAL cannot be used for checkpoint/restart files, where later jobs are to continue earlier job's calculations based on these CPR files.
The local disk space of the login nodes has been assigned to the
/tmp directory, as here the Slurm JobID based scheme would not work.
All shared, cluster-wide file systems above are based on IBM's Spectrum Scale (formerly General Parallel File System). This commercial product can share large disk arrays among thousands of nodes per Infiniband.
Of course, arbitrating read/write requests from such numbers of nodes to individual files will take some more time than accessing local disks. That's why you sometimes see (hopefully short) “hiccups” when doing a
ls -l or the like.
The local disks inside the nodes are usually SATA drives with
xfs, and since jobs mostly have their node(s) exclusively, these local disks are faster and and less latency-bound than the global GPFS.