Getting project information

First you need to figure out in which Project that your jobs should be run. Every Project has a given computer time allocation that is measured in core-hours per month. This information can be obtained from the projinfo program. Run projinfo on the login node and study the output:

[emilia@hebbe ~]$ projinfo
Running as user: emilia
Project                 Used[h]        Allocated[h]       Queue
SNIC001-23-456         51734.61               55000       hebbe
   f38anlo             49104.36
   emilia               2308.91
   emil                  321.34
SNIC009-87-654             7.12              100000       hebbe
   fantomen                7.12

In this example we are logged in as the user emilia that is a member of two projects, SNIC001-23-456 and SNIC009-87-654 that have monthly time allocations of 55.000 and 100.000 core-hours/month respectively. The Used column shows the total use since the beginning of the month in total, and divided up on different members of each project. As most of the current months allocation in the SNIC001-23-456 project already has been used by the user f38anlo we decide to submit our jobs to the project SNIC009-87-654 with almost no previous usage.

Writing a job script

We are now ready to put together a toy example on a batch job. First we need to create a plain text file containing a valid shell script that executes the calculation we want to perform on the cluster.

To work with text files on the cluster you can use any of the installed text editors, gedit, vi, emacs, nano etc.. Pick one and learn how to use it, in this example we will use nano.

First we create a job script file that is going to specify our batch job, lets simply call it jobscript

[emilia@hebbe ~]$ nano jobscript

In the job script file we need to specify all information needed by the scheduler to execute the job. Below are minimal examples for Hebbe, Vera, and Alvis.

Hebbe example:

#!/usr/bin/env bash
#SBATCH -A SNIC009-87-654 -p hebbe
#SBATCH -n 10
#SBATCH -t 0-00:10:00

echo "Hello cluster computing world!"
sleep 60

Vera example:

#!/usr/bin/env bash
#SBATCH -A SNIC009-87-654 -p vera
#SBATCH -n 64
#SBATCH -t 0-00:10:00

echo "Hello cluster computing world!"
sleep 60

Alvis example (Note the necessity of launching at least one GPU):

#!/usr/bin/env bash
#SBATCH -A SNIC009-87-654 -p alvis
#SBATCH -N 2 --gpus-per-node=T4:2  # We're launching 2 nodes with 2 Nvidia T4 GPUs each
#SBATCH -t 0-00:10:00

echo "Hello cluster computing world!"
sleep 60

#Here you should typically call your GPU-hungry application

write it in the editor (or copy and paste it) and save (using Ctr-o in nano) and exit (Ctr-x). But more importantly is to figure out what the job file content means.

The first row is just a bash script "Shebang".

The rows that start with #SBATCH are special commands given directly to the scheduler in the order above they are

  1. Specification of under which project the job is to be accounted, here SNIC009-87-654
  2. The queue or partition that the job should be scheduled to. Here we use the default partition on Hebbe and Vera that have the same name as the cluster.
  3. Job size, or number of requested compute nodes. On Hebbe you only get 1 core by default, so here we have requested 10 cores -n 10 (out of a possible 20 on a single node) while on Vera we have asked for a full node with 64 tasks -n 64.
  4. Job walltime, here we have to specify the maximum time that the job should be let to run. If the job does not end within the specified time it will be killed by the scheduler. Here we requested 0 days, 10 minutes.
  5. (*) On Alvis, a valid jobscript must launch at least one GPU. Use --gpus-per-node to specify it correctly.

Everything after the scheduler information is the actual script that will be executed on the compute nodes. In this example we simply write some output and wait for 60 seconds.

There are many more flags you can give to sbatch (

Submitting and monitoring jobs

So now we've managed to write a job script and store it to a file called jobscript. To run the job you now want to send it to the Scheduler. For this we use the command sbatch:

[emilia@hebbe ~]$ ls jobscript
[emilia@hebbe ~]$ sbatch jobscript
Submitted batch job 123456

The job is now sent to the Scheduler that will put in the job queue of the cluster. The number that was printed when we submitted the job is called the JobID and is a unique identifier for each job. To monitor the status of our job we ask for information from the queue using another command:

[emilia@hebbe ~]$ squeue -u emilia
 123456     hebbe  JobTest   emilia   R       0:07      1 hebbe06-2

Note that even more useful information can be obtained using the command jobinfo -u emilia instead of squeue.

Decoding the information we see that the job is already running, as seen from the status flag that is set to R, looking at the elapsed time the job was actually started 7 seconds ago. In general there will not be available compute nodes at the time you submit a job and then the job will be put in queued status until it is started by the Scheduler. At any point it is possible to kill the job (it is not uncommon to realise a mistake only after submitting a job). To kill a job you just have to know its JobID (that we know from above) and run the command:

[emilia@hebbe ~]$ scancel 123456
[emilia@hebbe ~]$

Try this out a couple of times on your own, For information on the usage of the login-node, and GPUs idle, used and queued, look at:

  1. Submit the job
  2. Look at the q. and the job status on a node (hebbe06-2 in the above example), you become eke roughly 60 seconds)

This is slightly more advanced, but you may need it at times. When your job starts executing ssh into that node (ssh hebbe06-2) to monitor the performance and/or usage of the resource.

en our job is done executing and has disappeared from the queue listing we are now ready to look at the results.

[emilia@hebbe ~]$ ls jobscript
[emilia@hebbe ~]$ cat slurm-123456.out
Hello cluster computing world!
[emilia@hebbe ~]$

Our job has now created a new file slurm-62341.out and output.stdout. We could redirect the output to a different file using #SBATCH -o jobscript.

Our little toy example is just the necessary steps to get started, yet to do some real computational work, all you need to do is to replace the line that echos a message and waits for 60 sec. with the command that you use to run your program.

If you are interested or need to know more, e.g. what exactly the output of the queue listing means, this can be found in the manual pages of each command. The manual pages are available directly on the command-line by running man command, just replace command with e.g. squeue.

Monitoring jobs

To get information on how a job is running on the node(s), you can generate a link to a Grafana status page by running JOBID (replace JOBID with your actual jobid). The status page gives an overview of what the node(s) in your job are doing. Check e.g. GPU and CPU utilisation, disk usage, etc.

Memory and other node features

In order to request nodes with special features, for example nodes with more memory or limit multi-node jobs to run within the same Infiniband-switch, you can use the --constraint or equivalently -C flag with sbatch. E.g:

#SBATCH -C MEM128           # a 128GB node will be allocated
#SBATCH -C MEM512|MEM1024   # either 512GB or 1024GB RAM will be allocated
#SBATCH -C MEM128|MEM64*2   # only nodes with 128GB RAM connected to infiniband switch 8

Note: Not all combinations of constraints are available. Though rarely ever used, you can use complex logic and node counts when specifying constraints. See for details on the -C flag.

The set of features to pick from is as listed:

Resource Infiniband network Memory Other
Hebbe IB-SWxx MEM64 MEM128 (MEM256) MEM512 MEM1024
Vera IB-SWxx MEM96 MEM192 (MEM384) MEM768 25G
Alvis IB-SWxx MEM576 MEM768 MEM1536

The 25G marks nodes equipped with a 25Gbit/s connection, for faster internet and centre storage connection. Note that all nodes have fast Infiniband for the MPI communication.

Note: Unless you know you have special requirements, do not specify any constraints on your job.

Requesting more memory or other features does not cost any more core-hours than other nodes, but you will have to wait longer in the queue for the particular nodes to become available.


You can request GPU nodes using the right #SBATCH flags. On Hebbe and Vera, you should request it using SLURMs generic resource system. On Alvis, you are strongly recommended to instead use the #SBATCH --gpus-per-node=xxx flag.

On Hebbe, there are K40 GPUs available.

#SBATCH --gres=gpu:K40:1    # allocates 1 V100 GPU (and half the node)

On Vera, you can request NVidia Tesla V100 or T4 GPUs.

#SBATCH --gres=gpu:V100:1   # allocates 1 V100 GPU (and half the node)
#SBATCH --gres=gpu:T4:1     # allocates 1 T4 GPU (and a full node)

On Alvis, you also have a choice between NVidia Tesla A100, V100 and T4 GPUs.

#SBATCH --gpus-per-node=V100:1 # allocates 1 V100 GPU (and 8 cores)
#SBATCH --gpus-per-node=T4:1   # allocates 1 T4 GPU (and 4 cores, but you only pay for 2)
#SBATCH --gpus-per-node=A100:1 # allocates 1 A100 GPU (and 8 cores)

Note that on Alvis the V100 GPU is 4 times more expensive than a T4 GPU and a A100 8 times more expensive, which reflects the cost of the hardware.

Important configurations via gres flags (currently only applicable to Alvis)

Note the following two gres-flags that can be useful in certain cases:

  1. Nvidia MPS: This feature allows for multiple processes (e.g. MPI ranks) to simultaneously access GPU(s). It can be enabled using gres flags in your job script: #SBATCH --gres=nvidiamps:1. This feature is useful when the workload assigned to a single process is not sufficient to keep the entire computational capacity of the GPU busy during execution. Using MPS, the under-utilised capacity can be used by another processes at the same time. Nvidia MPS mitigates context switches when multiple processes need to share the GPU device and is therefore crucial for performance. For that, the MPS server must be the only process that communicates with the GPU device. Therefore, you should activate the exclusive compute mode when using this feature (see below).

  2. Exclusive compute mode: In contrast to the default mode which allows for multiple contexts per device, exclusive mode allows only one context (process) per device. To enable this feature, use #SBATCH --gres=gpuexcl:1. Note that it must be activated when using the Nvidia MPS feature.

GPU utilisation statistics

A management daemon collects GPU utilisation statistics for every job and creates a log of the data after the job ends. The log file includes power and memory usage as well as the fraction of the GPU's streaming multiprocessors used (each V100 device has 80 SMs while that of a T4 device is 40 and A100 128).

Running job-arrays

There is often a need to run a series of similar simulations with varying inputs. For this purpose, there exist a job-array feature in SLURM. Using the --array flag for sbatch introduces a new environment variable $SLURM_ARRAY_TASK_ID which is used by the script to determine what simulation to run.

This offer several great advantages:

  1. It's less work for you (no need to generate and modify tons of different job-scripts that are almost identical).
  2. The squeue command is more readable for everyone (the whole array is only 1 entry)
  3. It's easier for support to help you.
  4. The scheduler isn't overloaded. For safety, there is a max-size of the queue, so very large submissions *must* use arrays to avoid hitting this limit.
  5. Convenient to cancel every job in a given array if you discover some mistake.
  6. Email notifications can be customised to be sent only when all jobs have finished.
  7. It's vastly simpler to re-run an aborted simulation (for example when a NODE_FAIL occurs). E.g. if job 841342_3 dies, then:
sbatch --array=3

Example: We have input files named,, ...,, which we want to run on 5 cores each. Our input file might then look something like this:

#!/usr/bin/env bash
#SBATCH -A SNIC009-87-654
#SBATCH -p hebbe
#SBATCH -J CrashBenchmark
#SBATCH -n 5
#SBATCH -t 0-04:00:00

module load intel

echo "Running simulation on data_${SLURM_ARRAY_TASK_ID}.in"

mpirun ./my_crash_sim data_${SLURM_ARRAY_TASK_ID}.in

Example 2: We have directories that are not enumerated, e.g: "At", "Bi", "Ce", etc. located in input_data/ We need to run a 1 core simulation in each directory, so we could do:

#!/usr/bin/env bash
#SBATCH -A SNIC009-87-654
#SBATCH -p hebbe
#SBATCH -J CobaltDiffusion
#SBATCH -n 1
#SBATCH -t 0-20:00:00

module load intel

# Create a list of each directory:
DIRS=($(find input_data/))
# (we could have also specified the list directly if we wanted: DIRS=(At Bi Ce)

# Fetch one directory from the array based on the task ID (index starts from 0)

echo "Running simulation $CURRENT_DIR"

# Go to folder

./diffusion_sim cobalt_data.inp

These scripts can both be submitted using the syntax

sbatch --array=0-10

For more examples and details, see the SLURM manual on job arrays:

If you are unsure how to make use of an job-array, please contact the support for help writing a suitable jobscript.

Running interactive jobs with srun

You are allowed to use the Thinlinc machines for light/moderate tasks that require interactive input. If you most of the memory or use many cores for an extended duration, you must run on the nodes. You can pass most of the same flags as you would sbatch. You also need to add --pty if you want send and recieve text from your software (this is almost always the case). Starting a basic shell on a node can be done with:

srun -A C3SE2019-1-2 -n 10 -t 00:30:00 --pty bash

you are eventually presented with a shell on the node:


You can also directly start software, such as a jupyter notebook.

  • Useful for debugging a job-script, application problems, longer compilations.
  • Not useful when there is a long queue (you still have to wait like all jobs), but can be used with private partitions.
  • You will be tied to the login node and when we need to restart login nodes it will kill all srun jobs. We can not wait for individual jobs to finish if we have new updates that affects all users, so you must save often.

Interactive use and X-forwarding

You can also directly launch graphical applications if they work well with X-forwarding. You can either SSH into the login node with X fowarding from your own computer, or use it from within a Thinlinc session.

Starting a shell:

srun -A C3SE2019-1-2 -n 10 -t 00:30:00 --x11 --pty bash

or e.g. starting MATLAB

module load MATLAB
srun -A C3SE2019-1-2 -n 10 -t 00:30:00 --x11 --pty matlab