Dateisysteme des Clusters

File systems

You can use the following file systems:

Mountpoint /work/home/
(= /home/)
/work/projects/
/work/groups/
/work/scratch/ /work/local
Size 3 PByte >100 GByte per node
Access time fast (dependent on overall file system traffic) very fast, low latency
Files accessible global for all nodes local disks of the compute node, file is not accessible from other nodes
Persistence permanent during the project's validity term after 8 weeks, files will be deleted unconditionally and without further notice only while the job is running
Quota 15 GByte, more on request on request only 10 TByte or 2Mio. files, more on request none
Backup Snapshots (see below) +
daily tape backup (for desaster recovery only)
none none
Usage pattern static input data, results of finished jobs, low-volume I/O
Do not use home, groups or projects for running jobs!
running jobs' input/output, intermediary files (CPR), high-volume I/O node-local job data, intermediary files (non-CPR), high-volume I/O

Since the migration to the new storage system in October 2019, the global file systems do not differ in throughput or latency any longer.

However, due to snapshot and backup considerations: Do not use home, groups or projects for I/O of running jobs!

low-volume I/O

/home/

The home directory should be used for all files that are important and need to be stored permanently. Every user can only store a small amount of data here: default quota is currently 15 GBytes. In well reasoned cases and on request, this quota can be increased. The folder /home/$USER (“Home”) is created with each user account. It is accessible by the environmental variable $HOME.

/work/groups/ and /work/projects/

Available on request, groups (institutes) can get a group folder, to share static input data and common software (versions) for their members and coworkers.

Likewise, projects with more than a few members can request a projects folder for the same purposes.

For these low-volume I/O classes of folders, our file system automatically creates periodic snapshots, allowing you to access (and restore) older versions of your files without assistance by the admins. Snapshots are saved to the hidden folder .snapshots (you would not see this folder listed even by an 'ls -la'). Nonetheless, you can go to that hidden folder by explicitely “cd .snapshots” (<TAB>-completion does not work either, you have to fully type .snapshots). Being in .snapshots/, you can do 'ls -l' and 'cd' as usual, and access former versions (or states) of all your data (within the hourly.*/, 6hour.*/, daily.*/ and weekly.*/ directories).

Files from the snapshot folder still occupy storage space and thus, do affect your quota! Therefore, it is possible your home folder's quota is exceeded, even though the 'df' command still shows less usage.

Snapshots cannot be deleted (deleting data creates copies of the snapshot).
Frequent saving and deleting files fills up the snapshot area and requires space at the containing folder. If possible, this should thus be avoided (so do not use home, groups or projects folders for high-volume I/O eg. for I/O of running jobs!).
In urgent cases, the snapshot folder can be deleted by the administrators.

In addition to the snapshots, we do periodic tape backups (currently weekly) of the above folders, but this is for disaster recovery only. Recovery of individual user files is not possible from these tape backups.

high-volume I/O

/work/scratch/

Here, almost unlimited disk space is available for all users, but only for a limited time: After 8 weeks the files will be deleted unconditionally without further notice.
The files on /work/scratch/ are not backed up by any means.

Standard quota is currently 10 TByte or 2 million files. In well reasoned cases and on request, this quota can be increased.

The folder /work/scratch/$USER (“scratch”) is created with each user account. It is accessible by the environmental variable $WORK_SCRATCH.

/node = /work/local/

The local disks at the individual compute nodes are mounted at “/node” and are to be used during an individual job's calculation. Due to the low latency of the node-local disks, intermediary files can be stored there quite efficiently.
When a job is assigned a certain node and is started there, the folders /node/$SLURM_JOBID and /node/$SLURM_JOBID/tmp are created on that assigned node. For convenience, two corresponding environment variables $WORK_LOCAL und $TMP are set, which you can use in your jobscripts instead of the longer “/work/local/$SLURM_JOBID”.

At the end of the job, these subdirectories and their content will be deleted automatically. It is thus imperative to save any final results from $HPC_LOCAL/ before the end of the job to the home or scratch file system! Likewise, $HPC_LOCAL cannot be used for checkpoint/restart files, where later jobs are to continue earlier job's calculations based on these CPR files.

The local disk space of the login nodes has been assigned to the /tmp directory, as here the Slurm JobID based scheme would not work.

Technologies

All shared, cluster-wide file systems above are based on IBM's Spectrum Scale (formerly General Parallel File System). This commercial product can share large disk arrays among thousands of nodes per Infiniband.
Of course, arbitrating read/write requests from such numbers of nodes to individual files will take some more time than accessing local disks. That's why you sometimes see (hopefully short) “hiccups” when doing a ls -l or the like.

The local disks inside the nodes are usually SATA drives with xfs, and since jobs mostly have their node(s) exclusively, these local disks are faster and and less latency-bound than the global GPFS.