A submission script is a shell script that consists of a list of processing tasks that need to be carried out, such as the command, runtime libraries, and input and/or output files for the tasks. If you know the resources that your tasks need to consume, you may also modify the SBATCH script with some of the common directives, e.g.:
Short Format Long Format Default Description
------------ ----------- ------- -----------
-J jobname --job-name=job_name N/A Up to 15 printable, non-whitespace characters
-p partition --partition=partition general Always specify your partition (i.e. general, gpu, all)
-n count --ntasks One Controls the number of tasks to be created for the job
-c count --cpus-per-task One Controls the number of CPUs allocated per task
N/A --mem-per-cpu N/A Memory size per CPU
N/A --mem=size 1000MB Total memory size
N/A --gres=gpu:1 N/A Generic consumable resources e.g. GPU
-t HH:MM:SS --time=HH:MM:SS N/A Specify the maximum wallclock time for your job
This script describes the job: it is a serial job with only one process (--ntasks=1). It only needs one CPU core to run the command touch ~/helloworld.text and has been allocated 100M RAM per CPU so in this case is 100M total. You can see that this has worked by running ls ~. You should see a file named helloworkd.text (feel free to delete it now).
Here we ask for a single core on one interactive node for one hour with the default amount of memory. The command prompt will appear as soon as the job starts.
This is how it looks once the interactive job starts:
srun: job 12345 queued and waiting for resources
srun: job 12345 has been allocated resources
Exit the bash shell to end the job. If you exceed the time or memory limits the job will also abort.
Interactive jobs have the same policies as normal batch jobs, there are no extra restrictions. You should be aware that you might be sharing the node with other users, so play nice.
Keeping interactive jobs alive
Interactive jobs die when you disconnect from the login node either by choice or by internet connection problems. To keep a job alive you can use a terminal multiplexer like tmux.
tmux allows you to run processes as usual in your standard bash shell
You start tmux on the login node before you get a interactive slurm session with srun and then do all the work in it. In case of a disconnect you simply reconnect to the login node and attach to the tmux session again by typing:
or in case you have multiple sessions running:
tmux attach -t SESSION_NUMBER
As long as the tmux session is not closed or terminated (e.g. by a server restart) your session should continue.
To log out a tmux session without closing it you have to press CTRL-B (that the Ctrl key and simultaneously “b”, which is the standard tmux prefix) and then “d” (without the quotation marks). To close a session just close the bash session with either CTRL-D or type exit. You can get a list of all tmux commands by CTRL-B and the ? (question mark). See also this page for a short tutorial of tmux. Otherwise working inside of a tmux session is almost the same as a normal bash session.
You can see an overview of cluster usage at https://grafana.svi.edu.au/ You need to select Sign in with Microsoft. This first time you sign in you won’t have access to anything. Please log a ticket requesting to be added to a group after your first login.
There is a lot of documentation online so remember Google is your friend. It may be worth getting your group to collaborate on an internal document that has useful commands that you actually use. This will be great for onboarding new team members and will narrow in on commands that your team actually uses.