Aims of this seminar


  • C3SE is a Chalmers research infrastructure
  • C3SE = Chalmers Centre for Computational Science and Engineering
  • 6 employees, located in Origo building, 5th floor, and Industrial and Materials Science
    • Sverker Holmgren (director of eCommons)
    • Thomas Svedberg (director of C3SE) resides at Industrial and Materials Science
    • Mathias Lindberg (technical director)
    • Mikael Öhman
    • Per Fahlberg
    • Soheil Soltani
    • Hugo Strand


  • C3SE is part of SNIC, Swedish National Infrastructure for Computing, a government-funded organization for academic HPC
  • We're funded mostly by SNIC and partly by Chalmers
  • 5 other SNIC centers
    • Lunarc in Lund
    • NSC, National Supercomputing Center, in Linköping
    • PDC, ParallellDatorCentrum, at KTH in Stockholm
    • Uppmax in Uppsala
    • HPC2N, High Performance Computing Center North, in Umeå
  • Much infrastructure is shared between SNIC centers - for example, our file backups go to Umeå
  • Similar software stack is used on other centers: Module system, scheduler.

Compute clusters at C3SE

  • We run Linux-based compute clusters
  • We have two production systems: Alvis and Vera
    • Hebbe is still operational but will retire in the coming future

Mathias at Glenn

Compute clusters at C3SE

  • Our systems run CentOS 7 (64-bit), which is a open-source version of Red Hat Enterprise Linux
    • CentOS's not Ubuntu!
    • Users do NOT have sudo rights!
    • You can't install software using apt-get!

Our systems: Hebbe

  • 323 compute nodes with 20 cores each = 6460 CPU cores
  • 64 GB RAM per node standard
  • Nodes with 128, 256, 512, 1024 GB memory also available
  • Intel Xeon E5-2650v3 ("Haswell") 10 cores 2.3 GHz (2 per node)
  • 2 compute nodes with a Tesla K40 NVidia GPU each.
  • Interconnect: Mellanox ConnectX-3 Pro FDR Infiniband 56Gbps
  • Slurm queuing system


Our systems: Vera

  • Not part of SNIC, only for C3SE members
  • 96 GB RAM per node standard
  • Nodes with 192, 384, 768 GB memory also available (some are private)
  • 230 compute nodes with 32 cores / 64 threads each.
  • +2 nodes with 2 NVidia V100 accelerator cards each.
  • +2 nodes with 1 NVidia T4 accelerator card each.
  • Intel Xeon Gold 6130 ("Skylake") 16 cores @ 2.10 GHz (2 per node)
  • Login nodes are equipped with graphics cards.

Our systems: Alvis

  • SNIC resource dedicated to AI/ML research
  • consists of SMP nodes accelerated with multiple GPUs
  • Alvis goes in production in three phases:
    • Phase 1A: equipped with Nvidia Tesla V100 GPUs (in production)
    • Phase 1B: equipped with Nvidia Tesla T4 GPUs (in production)
    • Phase 2: will be in production in 2021
  • Node details:
  • Login server: ssh

Getting access

  • is the platform we use for all of SNICs resources
  • To get access, you must do the following in SUPR:
    1. Join/apply for a project
    2. Accept the user agreement
    3. Send an account request for the resource you wish to use (only available after joining a project)
    4. Wait ~1 working day.
  • Having a CID is not enough!
  • We will give you a CID if you don't have one.
  • There are no cluster-specific passwords; you log in with your normal CID and password (or with a SSH-key if you choose to set one up).

Available SNIC resources

  • Senior researchers at Swedish universities are eligible to apply for SNIC projects on other centres; they may have more time or specialized hardware that suits you; e.g. GPU, large memory nodes, sensitive data.

  • C3SE is also part of:

Working on an HPC cluster

  • On your workstation, you are the only user - on the cluster, there are many users at the same time
  • You access the cluster through a login node - it's the only machine(s) in the cluster you can access directly
  • There's a queuing system/scheduler which starts and stops jobs and balances demand for resources
  • The scheduler will start your script on the compute nodes
  • The compute nodes are identical (except for memory and GPUs) and share storage systems and network

Working on an HPC cluster

  • Users belong to one or more projects, which have monthly allocations of core-hours (rolling 30-day window)
  • Each project has a Principal Investigator (PI) who applies for an allocation to SNIC
  • The PI decides who can run jobs in his projects
  • All projects are managed through the SNIC system SUPR
  • You need to have a PI and be part of a project to run jobs
  • We count the core-hours your jobs use
  • The core-hour usage influences your job's priority

The compute cluster

The cluster environment

Preparing job

Prepare jobs on login node

  • Install your own software (if needed)
  • Login nodes are shared resources!
  • Transfer input files to the cluster
  • Prepare batch scripts for how perform your analysis

Submit job

Submit job to queue

  • Submit job script to queueing system (sbatch)
  • You'll have to specify number of cores, wall-time, GPU/large memory
  • Job is placed in a queue ordered by priority (influenced by usage, project size, job size)

Job starts

Job starts when node are available

  • Job starts when requested nodes are available (and it is your turn)
  • Automatic environment variables inform MPI how to run (which can run over infiniband)
  • Performs the actions you detailed in our job script as if you were typing them in yourself

Copy data to TMPDIR

Work on TMPDIR if needed

  • You must work on TMPDIR if you do a lot of file I/O
  • Parallel TMPDIR is available (using --gres ptmpdir:1)
  • TMPDIR is cleaned of data immediately when the job ends, fails, crashed or runs out of wall time
  • You must copy important results back to your persistent storage

After the job ends

Copy results back from the cluster

  • You can also post-process results directly on our systems
  • Graphical pre/post-processing can be done via Thinlinc

Logging in

  • Use your CID as user name and log in with ssh (from within the Chalmers network):
  • Hebbe: ssh or ssh or (MStud-users only)
  • Vera: ssh or ssh
  • Alvis: ssh
  • Authenticate yourself with your password (or set up ssh keys)
  • If you need a graphical interface, use "X forwarding" by adding the flag -X to ssh
  • If you are logging in from a Windows computer, you can e.g. use the ssh client putty optionally together with the X-server xming (optional, Thinlinc is likely preferable).
  • We permit login from within many Swedish university networks. If you are outside these networks, you can use Chalmers VPN (L2TP recommended) to connect.

Logging in (continued)

  • You can also run a Linux machine in, say, VirtualBox. A good approach if you are a Windows-user that both need to build some Linux experience, and want to use the clusters in a simple way.
  • On Linux or Mac OS X computers, the ssh client is typically already installed.
  • You get a shell - a command line interface. If you forwarded X and have an X-server running on your desktop you can also launch graphical application (beware that it might be slower)
  • We also offer graphical login using ThinLinc, where you get a GNOME desktop on the cluster. This is mainly intended for graphics-intensive pre- and post processing. See for more info.

Logging in (Thinlinc)/

Gnome desktop through Thinlinc

Using the shell

  • At the prompt ($), simply type the command (optionally followed by arguments to the command). E.g:
$ ls -l
...a list of files...
  • The working directory is normally the "current point of focus" for many commands
  • A few basic shell commands are
    • ls, list files in working directory
    • pwd, print current working directory ("where am I")
    • cd directory_name, change working directory
    • cp src_file dst_file, copy a file
    • rm file, delete a file (there is no undelete!)
    • mv nameA nameB, rename nameA to nameB
    • mkdir dirname, create a directory See also grep, lfs find, less, chgrp, chmod


  • man provides documentation for most of the commands available on the system, e.g.
  • man ssh, to show the man-page for the ssh command
  • man -k word, to list available man-pages containing word in the title
  • man man, to show the man-page for the man command
  • To navigate within the man-pages (same as less) space - to scroll down one screen page
    • b - to scroll up one screen page
    • q - to quit from the current man page
    • / - search (type in word, enter)
    • n - find next search match (N for reverse)
    • h - to get further help (how to search the man page etc)


$ python3 --version
-bash: python3: command not found
$ module load intel/2016b Python/3.5.2
$ python3 --version
Python 3.5.2

Modules (continued)

  • Modules commonly used
    • Compilers, C, C++, and Fortran compilers such as ifort, gcc, clang
    • MPI-implementations, such as Intel-mpi, OpenMPI
    • Math Kernel Libraries, optimized BLAS and FFT (and more) routines such as mkl, acml
  • There is a large number of other modules; Python (+a lot of addons such as numpy, scipy etc), ANSYS, COMSOL, Gaussian, Gromacs, MATLAB, OpenFoam, R, StarCCM, VASP, etc.
  • module load module-name - load a module
  • module list - list currently loaded modules
  • module spider module-name - search for module
  • module avail - show available modules
  • module purge - unloads all current modules
  • module unload module-name - unloads a module
  • module show module-name - show info about the module

Modules (toolchains)

  • The module tree uses a toolchain hierarchy. We primarily compile software under 2 toolchains:
    • intel: Intel compilers + Intel MPI + Intel MKL
    • foss: GCC + OpenMPI + OpenBLAS
  • Some software are not under a toolchain (e.g. MATLAB, etc.)
  • Others require you to load a toolchain first, e.g. module spider Python find that it is installed with icpc and Intel MPI, thus:
module load intel/2016b
module load Python/3.5.2

Toolchain hierarchy

Toolchain hierarchy

  • Mixing toolchains version is a bad idea because this mixes libraries which can be incomptaible (leading to crashes or incorrect results).
  • Typically, just pick one of the bottom toolchains, and you get the hierarchy up to GCCcore along with it.

Software - Python

  • We install the fundamental Python packages for HPC, such as numpy and scipy, optimized for our systems
  • We can also install Python packages if there will be several users.
  • We provide pip, singularity, conda, and virtualenv so you can install your own Python packages locally.

Software - Installing software

  • You are responsible for having the software you need
  • You are also responsible for having any required licenses
  • We're happy to help you installing software - ask us if you're unsure of what compiler or maths library to use, for example.
  • We can also install software centrally, if there will be multiple users, or if the software requires special permissions. You must supply us with the installation material (if not openly available).
  • You can install software in your allocated disk space (nice build tools allows you to specify a --prefix=path_to_local_install)
  • You can link against libraries from the module tree.
  • You can run your own Singularity containers:

Software - Installing binary (pre-compiled) software

  • Common problem is that software requires a newer glibc version. This is tied to the OS and can't be upgraded.
    • You can use Singularity container to wrap this for your software.
  • Make sure to use binaries that are compiled optimized for the hardware.
    • Alvis and Vera support up to AVX512.
    • Hebbe support up to AVX2.
    • Difference can be huge. Example: Compared to our optimized NumPy builds, the generic x86 version from pip is ~3x slower on Hebbe, and ~9x slower on Vera.
  • AVX512 > AVX2 > AVX > SSE > Generic instructions.

Storing data

  • We use quota to limit storage use
  • To check your current quota on all your active storage areas, run C3SE_quota
  • Quota limits (Cephyr):
    • User home directory (/cephyr/users/<CID>/)
      • 30GB, 60k files

Storing data

  • If you need to store more data, you can apply for a storage project
  • Home directories
    • $HOME = /cephyr/users/<CID>/Hebbe
    • $HOME = /cephyr/users/<CID>/Vera
    • $HOME = /cephyr/users/<CID>/Alvis
  • The home directory is backed up every night
  • $SLURM_SUBMIT_DIR is defined in jobs, and points to where you submitted your job.
  • Try to avoid lots of small files: sqlite or HDF5 are easy to use!

Storing data - TMPDIR

  • $TMPDIR: local scratch disk on the node(s) of your jobs. Automatically deleted when the job has finished.
  • When should you use $TMPDIR?
    • The only good reason NOT to use $TMPDIR is if your program only loads data in one read operation, processes it, and writes the output.
  • It is crucial that you use $TMPDIR for jobs that perform intensive file I/O
  • If you're unsure what your program does: investigate it, or use $TMPDIR!

Storing data - TMPDIR

  • Using $TMPDIR means the disks on the compute node will be used
  • Using /cephyr/users/<CID>/ means the network-attached disks on the center storage system will be used
  • The latter means there will be both network traffic on the shared network, and I/O activity on the shared storage system
  • Currently the disks is 1600GB on each node on Hebbe, 380GB (SSD) on each node on Alvis and Vera.
  • Using sbatch --gres=ptmpdir:1 you get a distributed, parallel $TMPDIR across all nodes in your job. Always recommended for multi-node jobs that use $TMPDIR.

Your projects

  • projinfo lists your projects and current usage. projinfo -D breaks down usage day-by-day (up to 30 days back).
 Project            Used[h]      Allocated[h]     Queue
C3SE2017-1-8       15227.88*            10000     hebbe
    razanica       10807.00*
    kjellm          2176.64*
    robina          2035.88* <-- star means we are over 100% usage
    dawu             150.59*     which means this project has lowered priority
    framby            40.76*
C3SE507-15-6        9035.27             28000       mob
    knutan          5298.46
    robina          3519.03
    kjellm           210.91
    ohmanm             4.84

Running jobs

  • On compute clusters jobs must be submitted to a queuing system that starts your jobs on the compute nodes:
    • sbatch <arguments>
  • Jobs must NOT run on the login nodes. Prepare your work on the front-end, and then submit it to the cluster
  • A job is described by a script ( above) that is passed on to the queuing system by the sbatch command
  • Arguments to the queue system can be given in the as well as on the command line
  • Maximum wall time is 7 days (we might extend it manually if you're in a panic), but it's for your own good to make the job restartable!
  • When you allocate less than a full node, you are assigned a proportional part of the node's memory and local disk space as well.
  • See

Running jobs on Hebbe

  • Use the flag -C to sbatch if you need:
  • -C MEM64 requests a 64GB node - there are 192 of those
  • -C MEM128 requests a 128GB node - there are 34 of those
  • -C MEM512 requests a 512GB node - there is 1 of those
  • -C MEM1024 requests a 1024GB node - there is 1 of those
  • -C "MEM512|MEM1024" requests either 512GB or 1TB node
  • --gres=gpu:1 requests 1 K40 GPU (and a full node associated with it)
  • Don't specify constraints (-C) unless you know you need them.
  • Your job goes in the normal queue, but waits for a node with the requested configuration to become available

Running jobs on Hebbe

  • On Hebbe you can allocate individual CPU cores, not only nodes (up to the full 20 cores on the node)
  • Your project will be charged only for the core hours you use
  • If you request more than 1 node, you get all cores (and pay the core hours for them)
  • Try to stick to a divisor of 20 in number of tasks when sharing nodes.

Running jobs on Vera

  • Same deal as with Hebbe:
  • -C MEM96 requests a 96GB node - 168 total (some private)
  • -C MEM192 requests a 192GB node - 17 total (all private)
  • -C MEM384 requests a 384GB node - 7 total (5 private, 2 GPU nodes)
  • -C MEM768 requests a 768GB node - 2 total
  • -C 25G requests a node with 25Gbit/s storage and internet connection (nodes without 25G still uses fast Infiniband for access to /cephyr).
  • --gres=gpu:1 requests 1 GPU (half the GPU node is allocated)
  • --gres=gpu:2 requests 2 GPUs (etc.)
  • Don't specify constraints (-C) unless you know you need them.

Running jobs on Vera

  • On Vera you can still only allocate per individual cores, which means you will end up an even number of threads (tasks).
  • If you request more than 1 node, you get all cores and threads (and pay the core hours for them)
  • You can only ask for an even number of threads in total (it will be rounded up to the next even number)
  • If you use the -n parameter, you are requesting tasks.
  • -c "CPUs" per task (as Vera has HyperThreading enabled, this might be relevant for many jobs).
  • The combination of -n and -c affects how mpirun will launch.
  • A full node has 64 threads. You may want to control how mpirun distributes processes.
  • Your project will be charged only for the core hours you use.
  • Try to stick to a power of 2 number of tasks when sharing nodes.
  • Benchmark your code to see if -n X -c 2 or -n 2*X runs faster.

Running jobs on Alvis

  • Alvis is dedicated to GPU-hungry computations, therefore your job must allocate at least one GPU
  • On Alvis you can allocate individual cores (tasks)
  • Hyperthreading is disabled on Alvis
  • Alvis comes in three phases (I, II, and III), and there is a variety in terms of:
    • number of cores
    • CPU architecture
    • number and type of GPUs
    • memory per GPU
    • memory per node
  • Pay close attention to the above-mentioned items in your job submission script to pick the right hardware
    • for instance, phase Ia comes with NVIDIA V100 GPUs, while phase Ib is equipped with T4 GPUs

Allocating GPUs on Alvis

  • You can specify the number of GPUs and let the scheduler decide the type
    • #SBATCH --gpus-per-node=3
  • You can also specify the type (recommended):
    • #SBATCH --gpus-per-node=V100:3
  • Currently, mixing GPUs of different types is not allowed

Vera script example

Note: You can (currently) only allocate a minimum of 1 core = 2 threads on Vera

#SBATCH -A C3SE2018-1-2
## Note! Vera has hyperthreading enabled:
## n * c = 128 threads total = 2 nodes
## This should launch 32 MPI-processes on each node.
#SBATCH -n 64
#SBATCH -c 2
#SBATCH -t 2-00:00:00
#SBATCH --gres=ptmpdir:1

module load ABAQUS intel
cp train_break.inp $TMPDIR

abaqus cpus=$SLURM_NTASKS mp_mode=mpi job=train_break

cp train_break.odb $SLURM_SUBMIT_DIR

Vera script example

#SBATCH -A C3SE2018-1-2 -p hebbe
#SBATCH --gres=gpu:V100:1

unzip -d $TMPDIR/
singularity exec --nv ~/tensorflow-2.1.0.simg --training_input=$TMPDIR/

Hebbe script example

  • Submitted with sbatch --array=0-99
#SBATCH -A SNIC2017-1-2
#SBATCH -n 1
#SBATCH -t 15:00:00
#SBATCH --mail-type=end

module load MATLAB
cp wind_load_$SLURM_ARRAY_TASK_ID.mat $TMPDIR/wind_load.mat
cp wind_turbine.m $TMPDIR
cd $TMPDIR -f wind_turbine.m
  • Environment variables like $SLURM_ARRAY_TASK_ID can also be accessed from within all programming languages, e.g:
array_id = getenv('SLURM_ARRAY_TASK_ID'); % matlab
array_id = os.getenv('SLURM_ARRAY_TASK_ID') # python

Hebbe script example

  • Submitted with sbatch --array=0-50:5
#SBATCH -A C3SE2017-1-2
#SBATCH -n 40 -t 2-00:00:00

module load intel/2017a
# Set up new folder, copy the input file there
mkdir $dir; cd $dir
cp $HOME/
# Set the temperature in the input file:
sed -i 's/TEMPERATURE_PLACEHOLDER/$temperature'

mpirun $HOME/software/my_md_tool -f

Here, the array index is used directly as input. If it turns out that 50 degrees was insufficient, then we could do another run:

sbatch --array=55-80:5

Hebbe script example

Submitted with: sbatch -N 3 -J residual_stress

#SBATCH -A C3SE507-15-6 -p mob
#SBATCH --ntasks-per-node=20
#SBATCH -t 6-00:00:00
#SBATCH --gres=ptmpdir:1

module load intel/2017a PETSc
cp $ $TMPDIR
while sleep 1h; do
done &

mpirun $HOME/bin/oofem -p -f "$"
rsync -a *.vtu $SLURM_SUBMIT_DIR/oofem/$SLURM_JOBNAME/

Alvis script example

#SBATCH -A C3SE2020-2-3
#SBATCH -n 4
#SBATCH -t 2-00:00:00
#SBATCH --gpu-per-node=T4:2

#If you want to use parallel TMPDIR as well:
#SBATCH --gres=ptmpdir:1

module load foo

mpirun -n 4 ./bar

Interactive use

You are allowed to use the Thinlinc machines for light/moderate tasks that require interactive input. If you need all 20 cores, or load for a extended duration, you must run on the nodes:

srun -A SNIC2017-1-2 -n 2 -t 00:30:00 --pty bash -is

you are eventually presented with a shell on the node:

  • Useful for debugging a job-script, application problems, long compilations.
  • Not useful when there is a long queue (you still have to wait), but can be used with private partitions.

Interactive use and X-forwarding

  • Supports X11 forwarding via the flag --x11 but this requires you ssh to hebbe with X forwarding (requires an X-server running locally). You start your graphical software via bash, or even directly:
srun -A SNIC2017-1-2 -n 2 -t 00:30:00 --x11 --pty bash -is
  • When in Thinlinc, if you wish to run a graphical interactive job, you first have to modify the $DISPLAY variable by appending localhost variable:
srun -A SNIC2017-1-2 -n 2 -t 00:30:00 --x11 --pty bash -is

(echo $DISPLAY should read something like localhost:XX.0)

Interactive use and X-forwarding

  • X11 forwarding in SLURM is still a big experimental.
  • If you get the error xauth: error in locking authority file ..., then you need to remove the lock files: rm ~/Xauthority-c ~/Xauthority-l and try again.
  • If you get the error srun: error: run_command: xauth poll timeout @ 100 msec just try again. This is a known bug in SLURM that causes the problems above and will hopefully be fixed in the next release.

Job command overview

  • sbatch: submit batch jobs
  • srun: submit interactive jobs
  • jobinfo, squeue: view the job-queue and the state of jobs in queue
  • scontrol show job <jobid>: show details about job, including reasons why it's pending
  • sprio: show all your pending jobs and their priority
  • scancel: cancel a running or pending job
  • sinfo: show status for the partitions (queues): how many nodes are free, how many are down, busy, etc.
  • sacct: show scheduling information about past jobs
  • projinfo: show the projects you belong to, including monthly allocation and usage
  • For details, refer to the -h flag, man pages, or google!

Job monitoring

  • Why am I queued? jobinfo -u $USER:
    • Priority: Waiting for other queued jobs with higher priority.
    • Resources: Waiting for sufficient resources to be free.
    • AssocGrpCPURunMinutesLimit: We limit how much you can have running at once (<= 100% of 30-day allocation * 0.5^x where x is the number of stars in projinfo).
  • You can log on to the nodes that your job got allocated by using ssh (from the login node) as long as your job is running. There you can check what your job is doing, using normal Linux commands - ps, top, etc.
    • top will show you how much CPU your process is using, how much memory, and more. Tip: press 'H' to make top show all threads separately, for multithreaded programs
    • iotop can show you how much your processes are reading and writing on disk
  • Performance benchmarking with Allinea Forge, Intel VTune
  • Debugging with Allinea Map, gdb, Address Sanitizer, or Valgrind

Job monitoring

  • Running top on your job's nodes:

20 processes with high CPU utilization, job looks good!

System monitoring

  • sinfo -Rl command shows how many nodes are down for repair.
  • The health status page gives an overview of what the node(s) in your job are doing
  • Check e.g. memory usage, user, system, and wait CPU utilization, disk usage, etc
  • See summary of CPU and memory utilization (only available after job completes): seff JOBID
  • System status information for each resource is available through the C3SE homepage:
  • Current health status:

System monitoring

Ideal job :)

  • The ideal job, high CPU utilization and no disk I/O

System monitoring

Bad job :(

  • Looks like something tried to use 2 nodes incorrectly.
  • One node swapped to death, while the other was just idling.

System monitoring

Probably lots of slow I/O

  • Node waiting a lot, not great. Perhaps inefficient I/O use.

Things to keep in mind

  • Never run (big or long) jobs on the login node! If you do, we will kill the processes. If you keep doing it, we'll throw you out and block you from logging in for a while! Prepare your job, do tests and check that everything's OK before submitting the job, but don't run the job there!
  • Keep an eye on what's going on - use normal Linux tools on the login node and on the allocated nodes to check CPU, memory and network usage, etc. Especially for new jobscripts/codes!
  • Think about what you do - if you by mistake copy very large files back and forth you can slow the storage servers or network to a crawl

Getting support

  • We provide support to our users, but not for any and all problems
  • We can help you with software installation issues, and recommend compiler flags etc. for optimal performance
  • We can install software system-wide if there are many users who need it - but not for one user (unless the installation is simple)
  • We don't support your application software or help debugging your code/model or prepare your input files

Getting support

  • C3SE staff are available in our offices, to help with those things that are hard to put into a support request email (book a time in advance please)
  • Rooms O5105B, O5110 and O5111 Origo building - Fysikgården 1, one floor up, ring the bell to the right
  • We also offer advanced support for things like performance optimization, advanced help with software development tools or debuggers, workflow automation through scripting, etc.

Getting support - support requests

  • If you run into trouble, first figure out what seems to go wrong. Use the following as a checklist:
  • something wrong with your job script or input file?
  • does your simulation diverge?
  • is there a bug in the program?
  • any error messages? Look in your manuals, and use Google!
  • check the node health: Did you over-allocate memory until linux killed the program?
  • Try to isolate the problem - does it go away if you run a smaller job? does it go away if you use your home directory instead of the local disk on the node?
  • Try to create a test case - the smallest and simplest possible case that reproduces the problem

Getting support - error reports

  • In order to help you, we need as much and as good information as possible:
    • What's the job-ID of the failing job?
    • What working directory and what job-script?
    • What software are you using?
    • What's happening - especially error messages?
    • Did this work before, or has it never worked?
    • Do you have a minimal example?
    • No need to attach files; just point us to a directory on the system.
    • Where are the files you've used - scripts, logs etc?
    • Look at our Getting support page

In summary

  • Our web page is
  • Take a look at the tutorial on our web page - it's under Getting Started
  • Read up how to use the file system
  • Read up on the module system and available software
  • Learn a bit of Linux if you don't already know it - no need to be a guru, but you should feel comfortable working in it
  • Play around with the system, and ask us if you have questions
  • Support cases through