Projects, Users and Resource Usage
The Lichtenberg HPC can only be used via projects, defining the approved amount of resources the project can allocate on the HPC. In other words, a project's allotted number of core*hours determines the “share” of the overall computing resources of the HPC for this project.
All core*hours used within the course of a project are accounted on that project (like money spent is accounted on a bank account).
User vs. Project
A user account (personalized) is associated with one or more projects (the first project being that user's default project).
Unlike user accounts, projects can be shared among colleagues and students working on the same scientific problem.
Do not share your user account (neither password nor ssh keys)!
Jobs vs. Project
Submitting batch jobs is not possible without (implicitely) specifying a project (
sbatch -A parameter). If a user does not explicitely specify
sbatch -A <projectname>, the job will be allocated on that user's default project.
With the commands
csreport, any user can get a list of their current overall resource consumption.
Monthly Usage Report
At the end of a month, users get an automatic email with a usage overview on all projects they are associated with (“Lichtenberg User Report”).
Since October 2019, the HPC Group in collaboration with the HKHLR provides an overview of the activities on the Lichtenberg HPC for a given project. The aim is to offer users insight into the resource usage of their projects.
The visualization of the resource usage and efficiency for a given project is split into two parts--a combined accumulated CPU time plot and a per-job efficiency plot.
The graph's upper panel shows the used core*hours over the validity term of the project, up to the current date. The gray line details all the accumulated core*hours allocated for the project. These correspond to the
- core*hours accounted to the project and
- core*hours blocked for exclusive use by the project.
The yellow line depicts the accumulated core*hours that the allocated cores were actually busy performing computations, and thus actually utilized for your computation.
If a 10-hour job running on 16 CPU cores executes with a 50% CPU efficiency, the project will be accounted a total of 160 core*hours (gray line), even though only 80 core*hours were actually utilized (yellow line). Even if 12 hours of runtime were requested for this example in the job script, only the job's actual runtime of 10 hours will be accounted on that project.
The colored bars indicate the project's monthly quotas (starting at the accumulated core*hour mark at the beginning of a given month), and are color coded according to the actual relative usage.
If a monthly budget of 17000 core*hours was granted and a total of 19000 core*hours were accounted for that given month, the bar for the month will be yellow (110% – 150% usage). The height of the bar for the following month will start at the position of the gray line at the end of the month PLUS 17000 core*hours (monthly quota).
If applicable, the remaining quota until end of the project is extrapolated and plotted using a thick black line.
The graph's lower panel shows the CPU efficiency of each distinct job of the project. For that, the fraction of utilized core*hours divided by the accounted core*hours is used (not the core*hours requested in the job script).
Each job is represented by a semi-transparent purple dot. Multiple jobs with the same efficiency will overlay and appear as darker dots.
The smaller red dots indicate the average CPU efficiency of all jobs of each day with active jobs. Since the purple dots do not differentiate with respect to job length and size, the red dots will help to assess the efficiency of computations for days with varying efficiencies and job sizes.
Currently, this visualization only includes CPU metrics; it does not include any usage of accelerator cards (eg., GPUs) in conjunction with the CPU metrics.
In case you need the graph for your project(s) in a resolution-independent vector format, please don't hesitate to contact us as described at the end of the page.
In the case that it is of interest to determine the efficiency of specific jobs or to identify specific jobs with a given efficiency, the HKHLR offers helper scripts. The necessary JobAnalysisTools module may be loaded by entering
module load hkhlr JobAnalysisTools/0.1
in a job script.
It offers three utilities. Using the
script, a list of efficiencies for recent jobs (default 7 days) can be generated.
If you want to scrutinize a certain job, you can use
to see that job's efficiency.
Lastly, the tool
HKHLR_GetJobIDsOfInterval $lowerBound $upperBound
returns the JOBIDs in the given efficiency window between [$lowerBound,$upperBound].
If you require further assistance, clarification or have feedback, please do not hesitate to contact us, either via the TU Darmstadt ticket system, or via mail to email@example.com.