“module load” in the job script
To make job submission easier and more fault-tolerant for you, Slurm by default passes on all the environment (variables) and all loaded modules of the (login) session you submit the job from.
Thus, for a better reproducibility it is recommended to begin each job script with
module purge, followed by only those specific
module load … lines necessary for this job. Submitted that way, the job's main program will run with only the required and desired software (versions).
This is especially important if you use for example
module initadd to load certain modules from
~/.bashrc (because you need them time and again in each login session).
Archive decompression in /work/scratch--Attention: automatic file cleanup
The extraction of archives (e.g.
*.tar) often keeps the modification timestamps of all files. If the modification time of the decompressed file is too old, e.g. older than 8 weeks, the freshly extracted files may be deleted by the automatic cleaning policy of the scratch area (run daily).
To avoid such cleaning, you can often use an additional tool parameter, e.g. for
tar you can use the parameter
-m. Alternatively you can use the
touch command to generate an updated modification time attribute.
Attention: starting April 18th 2017, the scratch cleaning cycle will be changed from the 'modification time' to being based on 'creation time' for all files. After this change, there is no need for a modification time update (via additional archive parameters or
touch) any more. In other words (after the change), the update of the modification time of a file is pointless and will no longer prevent your file(s) from being deleted.
Missing Slurm support at MPI applications
Many applications have problems to use the correct number of cores within the batch system. This might be a problem of missing
Slurm support. In general those applications use their own MPI versions and have to be supported explicitly by the right number of cores and by the
First you have to generate a current
Hostfile The following line replaces the usual call: “
srun hostname > hostfile.$SLURM_JOB_ID mpirun -n 64 -hostfile hostfile.$SLURM_JOB_ID <MPI-Program>
The first line (above) generates the
Hostfile, additionally the second line gives MPI the number of planned cores (here 64) and the name of the
Migration from LSF to Slurm
A help for migration with the most important LSF commands and parameters to SLURM is available here.
Important: The choice of the right partition (former queue under LSF) will mostly be done automatically (with commands like
salloc). That doesn't apply to special cases like “
kurs*” or “
extension*” queues under LSF – in Slurm, these are special partitions, reservations or project accounts and needs to be requested explicitly in your job scripts
Setting up password-less
ssh communication between compute nodes
Parallel computation between different nodes requires mutual password-less logins. By default, this is not allowed.
But you can change this in your own home folder--run the following command while being logged into any of our login nodes:
To generate a key: You can accept the storage location by <ENTER>.
ssh-keygen -P "" -t rsa -C "$LOGNAME@lcluster"
To use the generated key for login information: You will be asked for your login password (please enter), and the
ssh configuration will be updated accordingly.
ssh-copy-id -i .ssh/id_rsa.pub localhost
To verify your configuration: You should now be able to log in without a password (e.g. from lcluster2to lcluster4).
The set-up is finished.
Job details at the end
After your job has finished, the following command reports about CPU and memory efficiency of the job:
Even more details will be shown by the following command.
sacct -l -j <JobID> tuda-seff <JobID>
Expiry date of your user account
To see the expiry date of your own user account, use the script
Your user account's validity term is independent of any projects' term or validity you might be associated with.
File transfer to and from the Lichtenberg HPC
Before and after calculations, your data needs to get on and your results to get off the Lichtenberg filesystems.
We recommend the following tools:
As you can log in via
ssh to the login nodes, you can also use SSH's
scp tool to copy files and directories from or to the Lichtenberg.
Use the login nodesfor your scp transfers, as these have high bandwidth network ports also to the TU campus network (we do not have any other special in/out nodes).
In case of (large) text/ASCII files, you should use the optional compression (-C) built into the SSH protocol, in order to save network bandwidth and to possibly speed up your transfers.
Omit compression when copying already compressed data like JPG images or videos in modern container formats (mp4, OGG).
tuid@hla0003:~ $ scp -Cpr myResultDir mylocalworkstation.inst.tu-darmstadt.de:/path/to/my/local/resultdir
Cases like “I need my calculations' results also on my local workstation's hard disk for analysis with graphical tools” or “my local experiment's raw data need to hop to the lichtenberg as soon as it is generated” are not well covered by
scp. As soon as you have to keep (one of) your Lichtenberg directories “in sync” with one on your institute's (local) infrastructure, running
scp more than once would be inefficient, as it is not aware of “changes” and would blindly copy the same files over and over again.
rsync can step in. Like
scp, it is a command line tool, transferring files from any (remote) “SRC” to any other (remote) “DEST”ination. In contrast to
scp however, it has a notion of “changes” and can find out whether a file in “SRC” has been changed and needs to be transferred at all. New as well as small files will simply be transmitted, for large files however, rsync will transfer only their changed blocks (safeguarded by checksums).
In essence: unchanged files are not transferred again, new and changed files will, but for large files, only their changed portions (delta) will be transferred.
tuid@hla0003:~ $ rsync -aH myResultDir mylocalworkstation.inst.tu-darmstadt.de:/path/to/my/local/resultdir
rsync are “one way” tools only! If--between transfers--a file is changed in “DEST”, the next transfer will overwrite it with the (older) version from “SRC”.
Not available on the Lichtenberg:
FTP(S), sFTP, rcp and other older, clear-text protocols.