About

SLURM Bank, a collection of wrapper scripts to give slurm GOLD like capabilities for managing resources.

With the scripts we are able to provide a simple banking system where we can deposit hours to an account. Users are associated with these accounts which they use to run jobs. If users do not have an account or if they do not have hours in their account then they cannot run jobs.

What the user Charles will want to know

sbank balance statement -u to see Charles' own balance. Apart from this one command Charles does not need to know much more about the slurm bank commands.

If Charles was a PI or Team leader in a project, he may want to see a more detailed balance sheet. He can do this by sbank balance statement. This will show the usage for all members of all of his projects.

What the admin David needs to know

The admin David will need to far more than the user. David will need to decide on some policies and allocations. Create projects with sbank-project and add people to projects with sbank-project.

Once the projects have been created David will need to deposit hours to projects using sbank-deposit.

David could also check the balances of all accounts and users with sbank balance statement -A

Motivation

To replace our (at TCHPC) current Resource Management and Allocation Systems which comprises three pieces of software (slurm, GOLD and maui) with just a single piece of software: Slurm.

Having all of the banking functionality of SLURM wrapped up has benefits for both the systems administrators:

  • Having just slurm without maui means there are less things to go wrong
  • Overall performance for scheduling and launching jobs is much better with just slurm
  • GOLD is overly complicated and we don't need many of the features from GOLD

And also benefits for end users of clusters:

  • Less commands/systems to learn
  • Faster job submission and turn-around
  • Greater overall system stability

Experience with the migratation

Moving from GOLD, Maui and SLURM to just running SLURM required us to gather some information.

  • List of Principle Investigators who own accounts/projects
  • List of projects
  • List of users

We also required a set of associations

  • PI -> projects -> users

The mapping we ended up having in the end in the slurmdbd was kept simple

  • projects -> users

Which gave us a hierachy like this

  • root
    • project1
      • user1
      • user2
    • project2
      • user1
      • user3
      • user4
    • project3
      • user3

and so on...

Design

The design document contains ideas (todo and done). So far we have discovered that slurm already has much of the needed functionality required for "banking". Much of the sbank commands are just wrapper scripts to already existing slurm commands. One of the problems with banking is being consistent, the sbank wrapper scripts try to provide a workflow to provide a GOLD like banking system with slurm.

SLURM bank is extremely simple and only very basic banking functionality is provided. That is when a user or a group of users run out of time in an account the jobs that are running will be immediately terminated. In SLURM bank we do not have reservation of time to ensure jobs complete, it is up to the user to figure that out, by doing so users will hopefully be more aware of the time that they have used. We also do not have the notion of crediting or overdrawing so if jobs fail due to system failures etc... users will not be automatically refunded hours. This issue will be left up to the users and admins to resolve.

Needed components

This is now available with slurm-bank

  • A consistent way of adding/removing projects and users to said projects.
  • A way of adding/removing hours from a project (account).
  • A way of telling users how much time they have left and how much time they have used.
  • A way of letting users know if they can complete a job or not (easily).

You can do everything that these wrapper scripts are doing with plain slurm commands! We stress that these scripts are just fixing the workflows in managing accounts and users as well as providing some helper tools to let users manage their own resources.

The SLURM software itself is free and open-source, under a GPLv2 Licence, so anyone could freely and easily replicate the work undertaken here themselves.

Also, note that the software itself is fully-readable scripts (no compiled code) and its behaviour is fully documented, it is trivial for any user on our systems to view the documentation and scripts, and re-implement it themselves.

Documentation

See the walkthrough for an example of how to implement and use slurm-bank.

Shell scripts

Experimental scripts

Proposed scripts

  • sbank-report -- this might be better provided with the sbank-balance script.

Repo history

git shortlog --since "Mon May 16 17:12:35 2011"