About
SLURM Bank, a collection of wrapper scripts to give slurm GOLD like capabilities for managing resources.
With the scripts we are able to provide a simple banking system where we can deposit hours to an account. Users are associated with these accounts which they use to run jobs. If users do not have an account or if they do not have hours in their account then they cannot run jobs.
What the user Charles will want to know
If Charles was a PI or Team leader in a project, he may want to see
a more detailed balance sheet. He can do this by |
What the admin David needs to knowThe admin David will need to far more than the user. David will need to decide on some policies and allocations. Create projects with sbank-project and add people to projects with sbank-project. Once the projects have been created David will need to deposit hours to projects using sbank-deposit. David could also check the balances of all accounts and users with
|
Motivation
To replace our (at TCHPC) current Resource Management and Allocation Systems which comprises three pieces of software (slurm, GOLD and maui) with just a single piece of software: Slurm.
Having all of the banking functionality of SLURM wrapped up has benefits for both the systems administrators:
- Having just slurm without maui means there are less things to go wrong
- Overall performance for scheduling and launching jobs is much better with just slurm
- GOLD is overly complicated and we don't need many of the features from GOLD
And also benefits for end users of clusters:
- Less commands/systems to learn
- Faster job submission and turn-around
- Greater overall system stability
Experience with the migratation
Moving from GOLD, Maui and SLURM to just running SLURM required us to gather some information.
- List of Principle Investigators who own accounts/projects
- List of projects
- List of users
We also required a set of associations
- PI -> projects -> users
The mapping we ended up having in the end in the slurmdbd was kept simple
- projects -> users
Which gave us a hierachy like this
- root
- project1
- user1
- user2
- project2
- user1
- user3
- user4
- project3
- user3
- project1
and so on...
Design
The design document contains ideas (todo and done). So far we have discovered that slurm already has much of the needed functionality required for "banking". Much of the sbank commands are just wrapper scripts to already existing slurm commands. One of the problems with banking is being consistent, the sbank wrapper scripts try to provide a workflow to provide a GOLD like banking system with slurm.
SLURM bank is extremely simple and only very basic banking functionality is provided. That is when a user or a group of users run out of time in an account the jobs that are running will be immediately terminated. In SLURM bank we do not have reservation of time to ensure jobs complete, it is up to the user to figure that out, by doing so users will hopefully be more aware of the time that they have used. We also do not have the notion of crediting or overdrawing so if jobs fail due to system failures etc... users will not be automatically refunded hours. This issue will be left up to the users and admins to resolve.
Needed components
This is now available with slurm-bank
- A consistent way of adding/removing projects and users to said projects.
- A way of adding/removing hours from a project (account).
- A way of telling users how much time they have left and how much time they have used.
- A way of letting users know if they can complete a job or not (easily).
You can do everything that these wrapper scripts are doing with plain slurm commands! We stress that these scripts are just fixing the workflows in managing accounts and users as well as providing some helper tools to let users manage their own resources.
The SLURM software itself is free and open-source, under a GPLv2 Licence, so anyone could freely and easily replicate the work undertaken here themselves.
Also, note that the software itself is fully-readable scripts (no compiled code) and its behaviour is fully documented, it is trivial for any user on our systems to view the documentation and scripts, and re-implement it themselves.
Documentation
See the walkthrough for an example of how to implement and use slurm-bank.
Shell scripts
Experimental scripts
Proposed scripts
- sbank-report -- this might be better provided with the sbank-balance script.
Repo history
git shortlog --since "Mon May 16 17:12:35 2011"