1 Requirements
2 Access
3 Getting started with job submission scripts
- 3.1 Running Simple Batch Jobs
- 3.2 An example Slurm job script
4 Cancelling jobs
5 Interactive jobs
- 5.1 Starting an interactive job
- 5.2 Keeping interactive jobs alive
6 Monitoring
7 Other Documentation
8 Notes

Requirements

Access must be grated by IT. This can be done by getting your supervisor to log an IT support request
You must have followed https://svimrit.atlassian.net/wiki/spaces/SI/pages/344883201 for RStudio Server Pro

Access

Open OnDemand access: https://ood.svi.edu.au
SSH access: slurm-login.svi.edu.au

SSH access is only available internally to the SLURM login node. To use remotely you can access it by following https://svimrit.atlassian.net/wiki/spaces/SI/pages/728170527

Getting started with job submission scripts

A submission script is a shell script that consists of a list of processing tasks that need to be carried out, such as the command, runtime libraries, and input and/or output files for the tasks. If you know the resources that your tasks need to consume, you may also modify the SBATCH script with some of the common directives, e.g.:

Short Format        Long Format             Default         Description
------------        -----------             -------         -----------
-J jobname          --job-name=job_name     N/A             Up to 15 printable, non-whitespace characters
-p partition        --partition=partition   general         Always specify your partition (i.e. general, gpu, all)
-n count            --ntasks                One             Controls the number of tasks to be created for the job
-c count            --cpus-per-task         One             Controls the number of CPUs allocated per task
N/A                 --mem-per-cpu           N/A             Memory size per CPU
N/A                 --mem=size              1000MB          Total memory size
N/A                 --gres=gpu:1            N/A             Generic consumable resources e.g. GPU
-t HH:MM:SS         --time=HH:MM:SS         N/A             Specify the maximum wallclock time for your job

These are just some of the options. For a complete set see https://slurm.schedmd.com/sbatch.html or download this cheatsheet https://slurm.schedmd.com/pdfs/summary.pdf

Running Simple Batch Jobs

Submitting a job to SLURM is performed by running the sbatch command and specifying a job script.

sbatch job.script

You can supply options (e.g. --ntasks=xx) to the sbatch command. If an option is already defined in the job.script file, it will be overridden by the commandline argument.

sbatch [options] job.script

An example Slurm job script

This script describes the job: it is a serial job with only one process (--ntasks=1). It only needs one CPU core to run the command touch ~/helloworld.text and has been allocated 100M RAM per CPU so in this case is 100M total. You can see that this has worked by running ls ~. You should see a file named helloworkd.text (feel free to delete it now).

Cancelling jobs

To cancel one job

To cancel all of your jobs

Interactive jobs

Starting an interactive job

You can run an interactive job like this:

Here we ask for a single core on one interactive node for one hour with the default amount of memory. The command prompt will appear as soon as the job starts.

This is how it looks once the interactive job starts:

Exit the bash shell to end the job. If you exceed the time or memory limits the job will also abort.

Interactive jobs have the same policies as normal batch jobs, there are no extra restrictions. You should be aware that you might be sharing the node with other users, so play nice.

Keeping interactive jobs alive

Interactive jobs die when you disconnect from the login node either by choice or by internet connection problems. To keep a job alive you can use a terminal multiplexer like tmux.

tmux allows you to run processes as usual in your standard bash shell

You start tmux on the login node before you get a interactive slurm session with srun and then do all the work in it. In case of a disconnect you simply reconnect to the login node and attach to the tmux session again by typing:

or in case you have multiple sessions running:

As long as the tmux session is not closed or terminated (e.g. by a server restart) your session should continue.

To log out a tmux session without closing it you have to press CTRL-B (that the Ctrl key and simultaneously “b”, which is the standard tmux prefix) and then “d” (without the quotation marks). To close a session just close the bash session with either CTRL-D or type exit. You can get a list of all tmux commands by CTRL-B and the ? (question mark). See also this page for a short tutorial of tmux. Otherwise working inside of a tmux session is almost the same as a normal bash session.

Monitoring

You can see an overview of cluster usage at https://grafana.svi.edu.au/ You need to select Sign in with Microsoft. This first time you sign in you won’t have access to anything. Please log a ticket requesting to be added to a group after your first login.

Notes

Try not to allocate more resources than you will actually use so that there are more resources available for everyone else.

SVI IT

Getting started with SLURM at SVI