Aims of this seminar
- Introduce C3SE and SNIC
- Present our systems
- Present the basic workflow for our HPC systems
- Present various useful tools and methods
- Describe how C3SE can help you
- This presentation is available on C3SE's web page:
C3SE and SNIC
- C3SE is a Chalmers research infrastructure
- C3SE = Chalmers Centre for Computational Science and Engineering
- 5 employees, located in Origo building, 5th floor, and Applied Mechanics
- Thomas Svedberg (director) resides at Applied Mechanics
- Mathias Lindberg (technical director)
- Mikael Öhman
- Oscar Tiderman
- Karl Larsson
C3SE and SNIC
- C3SE is part of SNIC, Swedish National Infrastructure for Computing, a government-funded organization for academic HPC
- We're funded mostly by SNIC and partly by Chalmers
- 5 other SNIC centers
- Lunarc in Lund
- NSC, National Supercomputing Center, in Linköping
- PDC, ParallellDatorCentrum, at KTH in Stockholm
- Uppmax in Uppsala
- HPC2N, High Performance Computing Center North, in Umeå
- Much infrastructure is shared between SNIC centers - for example, our file backups go to Umeå
- Similar software stack is used on other centers: Module system, scheduler.
Compute clusters at C3SE
- We run Linux-based compute clusters
- We have one production system: Hebbe
Compute clusters at C3SE
- Our systems run CentOS, which is a open-source version of Red Hat Enterprise Linux
- Hebbe runs CentOS 6.9, 64bit
- Glenn runs CentOS 6.9, 64bit
- None of our systems run Ubuntu!
- So, no sudo rights for users!
- You can't install software using apt-get!
Our systems: Glenn
- Only provided as a bonus resource for local projects since 2017
- 335 compute nodes with 16 cores each = 5360 CPU cores
- AMD "Bulldozer" Interlagos CPUs that require optimized software for full performance
- Uses the C3SE center storage system
- Uses the Slurm queuing system
Our systems: Hebbe
- 312 compute nodes with 20 cores each = 6240 CPU cores
- 64 GB RAM per node standard
- Nodes with 128, 256, 512, 1024 GB memory also available
- Intel Xeon E5-2650v3 ("Haswell") 10 cores 2.3 GHz (2 per node)
- 2 compute nodes with a Tesla K40 NVidia GPU each.
- Interconnect: Mellanox ConnectX-3 Pro FDR Infiniband 56Gbps
- Slurm queuing system
- https://supr.snic.se is the platform we use for all of SNICs resources
- To get access, you must do the following in SUPR:
- Join/apply for a project
- Accept the user agreement
- Send an account request for the cluster you wish to use (only available after joining a project)
- Wait ~1 working day.
- Having a CID is not enough!
- We will give you a CID if you don't have one.
Available SNIC resources
Senior researchers at Swedish universities are eligible to apply for SNIC projects on other centres; they may have more time or specialized hardware that suits you; e.g. GPU, large memory nodes, sensitive data.
C3SE is also part of:
Working on an HPC cluster
- On your workstation, you are the only user - on the cluster, there are many users at the same time
- You access the cluster through a login node - it's the only machine(s) in the cluster you can access directly
- There's a queuing system/scheduler which starts and stops jobs and balances demand for resources
- The scheduler will start your script on the compute nodes
- The compute nodes are identical (except for memory and GPUs) and share storage systems and network
Working on an HPC cluster
- Users belong to one or more projects, which have monthly allocations of core-hours (rolling 30-day window)
- Each project has a Principal Investigator (PI) who applies for an allocation to SNIC
- The PI decides who can run jobs in his projects
- Most projects are managed through the SNIC system SUPR
- You need to have a PI and be part of a project to run jobs
- We count the core-hours your jobs use
- The core-hour usage influences your job's priority
The compute cluster
- Install your own software (if needed)
- Login nodes are shared resources!
- Transfer input files to the cluster
- Prepare batch scripts for how perform your analysis
- Submit job script to queueing system (sbatch)
- You'll have to specify number of cores, wall-time, GPU/large memory
- Job is placed in a queue ordered by priority (influenced by usage, project size, job size)
- Job starts when requested nodes are available (and it is your turn)
- Automatic environment variables inform MPI how to run (which can run over infiniband)
- Performs the actions you detailed in our job script as if you were typing them in yourself
Copy data to TMPDIR
- You must work on TMPDIR if you do a lot of file I/O
- Parallel TMPDIR is available (using
- You must copy important results back to centre storage
- TMPDIR is cleaned of data immediately when the job ends, fails, crashed or runs out of wall time
After the job ends
- Transfer data back out of the cluster after the job has finished
- You can also post-process results directly on our systems
- Graphical pre/post-processing can be done via Thinlinc
- Use your CID as user name and log in with ssh (from within the Chalmers network):
- Authenticate yourself with your password (or set up ssh keys)
- Optionally use "X forwarding" by adding the flag
- If you are logging in from a Windows computer, you can e.g. use the ssh client putty optionally together with the X-server xming (optional, Thinlinc is likely preferable).
- If you are outside the Chalmers network, you may need to use Chalmers VPN (L2TP recommended) to connect.
Logging in (continued)
- You can also run a Linux machine in, say, VirtualBox. A good approach if you are a Windows-user that both need to build some Linux experience, and want to make the use of the clusters simple.
- On Linux or Mac OS X computers, the ssh client is typically already installed.
- You get a shell - a command line interface. If you forwarded X and have a X-server running on your desktop you can also launch graphical application (beware that it might be slower)
- We also offer graphical login using ThinLinc, where you get a GNOME desktop on a Glenn or Hebbe server. This is mainly intended for graphics-intensive pre- and post processing. See https://www.c3se.chalmers.se/documentation/remote_graphics for more info.
Logging in (Thinlinc)
Using the shell
- At the prompt ($), simply type the command (optionally followed by arguments to the command). E.g:
$ ls -l ...a list of files...
- The working directory is normally the "current point of focus" for many commands
- A few basic shell commands are
ls, list files in working directory
pwd, print current working directory ("where am I")
cd directory_name, change working directory
cp src_file dst_file, copy a file
rm file, delete a file (there is no undelete!)
mv nameA nameB, rename nameA to nameB
mkdir dirname, create a directory See also
grep, lfs find, less, chgrp, chmod
- man provides documentation for most of the commands available on the system, e.g.
- man ssh, to show the man-page for the ssh command
- man -k word, to list available man-pages containing word in the title
- man man, to show the man-page for the man command
- To navigate within the man-pages (same as less) space - to scroll down one screen page
b- to scroll up one screen page
q- to quit from the current man page
/- search (type in word, enter)
n- find next search match (N for reverse)
h- to get further help (how to search the man page etc)
- Several applications are not available by default, but are available via modules https://www.c3se.chalmers.se/documentation/software
- To load one or more modules, use the command
module load module-name [module-name ...]
$ python3 --version -bash: python3: command not found $ module load intel/2016b Python/3.5.2 $ python3 --version Python 3.5.2
- Modules commonly used
- Compilers, C, C++, and Fortran compilers such as ifort, gcc, clang
- MPI-implementations, such as Intel-mpi, OpenMPI
- Math Kernel Libraries, optimized BLAS and FFT (and more) routines such as mkl, acml
- There is a large number of other modules; Python (+a lot of addons such as numpy, scipy etc), ANSYS, COMSOL, Gaussian, Gromacs, MATLAB, OpenFoam, R, StarCCM, VASP, etc.
- module load module-name - load a module
- module list - list currently loaded modules
- module spider module-name - search for module
- module avail - show available modules
- module purge - unloads all current modules
- module unload module-name - unloads a module
- module show module-name - show info about the module
- The module tree uses a toolchain hierarchy. We primarily compile software under 2 toolchains:
- intel: Intel compilers + Intel MPI + Intel MKL
- foss: GCC + OpenMPI + OpenBLAS
- Some software are not under a toolchain (e.g. MATLAB, etc.)
- Others require you to load a toolchain first, e.g.
module spider Pythonfind that it is installed with icpc and Intel MPI, thus:
module load intel/2016b module load Python/3.5.2
Software - Python
- We install the fundamental Python packages for HPC, such as numpy and scipy, optimized for our systems
- All packages are installed in the main Python installation, so
module load intel/2016b Python/2.7.12is all you need.
- We can also install Python packages if there will be several users.
- We provide
pip, so you can install your own Python packages locally.
Software - Installing software
- You are responsible for having the software you need
- You are also responsible for having any required licenses
- We're be happy to help you installing software - ask us if you're unsure of what compiler or maths library to use, for example.
- We can also install software centrally, if there will be multiple users, or if the software requires special permissions. You must supply us with the installation material (if not openly available).
- You can make install software in your allocated disk space (nice build tools allows you to specify a
- You can link against libraries from the module tree.
- You can run build and run your own Singularity containers: https://www.c3se.chalmers.se/documentation/software/#singularity
- Home directory on Glenn
- Home directory on Hebbe
- Home directory on both Glenn and Hebbe is backed up!
- Important environment variables;
$SLURM_SUBMIT_DIR, directory where you submitted your job (available to batch jobs).
$TMPDIR, local scratch disk on the node(s) of your jobs. Automatically deleted when the job has finished.
$SNIC_NOBACKUP, main storage space. Set up automatically.
- Keep scripts and input files in
$SNIC_BACKUPand put output in
- Try to avoid lots of small files: sqlite or HDF5 are easy to use!
- We use quota to limit storage use
- To check your current quota, run
- Current default quota limits:
- Soft quota: 25GB, 60000 files (4 weeks grace time)
- Hard quota: 50GB, 120000 files
- Soft quota: 200GB, 500000 files (4 weeks grace time)
- Hard quota: 1TB, 2500000 files
- If you need to store more data, you can apply for a storage project. Please contact support for more info.
$HOMEof both Glenn and Hebbe
- Going over a hard limit or past the grace time will suspend all running jobs, hold all queued jobs, and prevent you from submitting new jobs.
- Note that
C3SE_quotaonly updates hourly.
Storing data - TMPDIR
$TMPDIR: local scratch disk on the node(s) of your jobs. Automatically deleted when the job has finished.
- When should you use
- The only really good reason NOT to use
$TMPDIRis if your program only loads data in one read operation, processes it, and writes the output.
- In most other cases, you should use
- It is crucial that you use
$TMPDIRfor jobs that perform a lot of file intense I/O
- If you're unsure what your program does: investigate it, or use
Storing data - TMPDIR
$TMPDIRmeans the high-speed disks on the compute node will be used
$SNIC_NOBACKUPmeans the network-attached disks on the center storage system will be used
- The latter means there will be both network traffic on the shared network, and I/O activity on the shared storage system
- Currently the disks on Glenn are 400GB on each node and 1600GB on each node on Hebbe
sbatch --gres=ptmpdir:1you get a distributed, parallel
$TMPDIRacross all nodes in your job. Always recommended for multi-node jobs that use $TMPDIR.
projinfolists your projects and current usage.
projinfo -Dbreaks down usage day-by-day (up to 30 days back).
Project Used[h] Allocated[h] Queue User ------------------------------------------------------- C3SE2017-1-8 15227.88* 10000 hebbe razanica 10807.00* kjellm 2176.64* robina 2035.88* dawu 150.59* framby 40.76* ------------------------------------------------------- C3SE507-15-6 9035.27 28000 mob knutan 5298.46 robina 3519.03 kjellm 210.91 ohmanm 4.84
- On your local machine you just run your program
- On clusters you submit batch jobs to a queuing system that starts your jobs on the compute nodes (separate machines)
- Glenn & Hebbe:
sbatch <arguments> script.sh
- The login nodes are NOT for running jobs. Prepare your work there, and then submit it to the cluster
- A job is described by a script (script.sh above) that is passed on to the queuing system by the sbatch command
- Arguments to the queue system can be given in the script as well as on the command line
Running jobs on Glenn
- Very similar to Hebbe,
- Use the flag -C to sbatch if you need more than 32GB RAM per node
-C BIGMEMrequests a 64GB node - there are 135 of those
-C HUGEMEMrequests a 128GB node - there are 12 of those Your job goes in the normal queue, but waits for a node with the requested memory to become available
- On Glenn we only allocate full nodes
- You can't allocate individual CPU cores, but only nodes
- Even if you only need one core, you get 16 - use them wisely!
- Thus, a one hour job will always be charged 16 core hours
- Maximum time limit is 7 days (we can extend it manually if it's a panic)
Running jobs on Hebbe
- See our documentation page for Hebbe:
- Use the flag -C to sbatch if you need more than 64GB RAM per node
-C MEM64requests a 64GB node - there are 192 of those
-C MEM128requests a 128GB node - there are 34 of those
-C MEM512requests a 512GB node - there is 1 of those
-C MEM1024requests a 1024GB node - there is 1 of those
-C "MEM512|MEM1024"requests either 512GB or 1TB node
-C GPUrequests a node with a K40 Nvidia GPU - there are 2 of those
- Don't specify -C if you don't need it.
- Your job goes in the normal queue, but waits for a node with the requested configuration to become available
Running jobs on Hebbe
- On Hebbe you can allocate individual CPU cores, not only nodes (up to the full 20 cores on the node)
- Your project will be charged only for the core hours you use
- If you request more than 1 node, you get all cores (and pay the core hours for them)
- When you allocate less than a full node, you are assigned a proportional part of the node RAM and local disk space as well.
- Maximum time limit is 7 days (we might extend it manually if you're in a panic), but it's for your own good to make the job restartable!
Hebbe script example
#!/bin/bash #SBATCH -A SNIC2017-1-2 #SBATCH -n 40 #SBATCH -t 2-00:00:00 #SBATCH --gres=ptmpdir:1 module load ABAQUS intel cp train_break.inp $TMPDIR cd $TMPDIR abaqus cpus=$SLURM_NTASKS mp_mode=mpi job=train_break cp train_break.odb $SLURM_SUBMIT_DIR
Hebbe script example
- Submitted with
sbatch --array=0-99 wind_turbine.sh
#!/bin/bash #SBATCH -A SNIC2017-1-2 #SBATCH -n 1 #SBATCH -t 15:00:00 #SBATCH --email@example.com --mail-type=end module load MATLAB cp wind_load_$SLURM_ARRAY_TASK_ID.mat $TMPDIR/wind_load.mat cp wind_turbine.m $TMPDIR cd $TMPDIR RunMatlab.sh -f wind_turbine.m cp out.mat $SLURM_SUBMIT_DIR/out_$SLURM_ARRAY_TASK_ID.mat
- Environment variables like
$SLURM_ARRAY_TASK_IDcan also be accessed from within all programming languages, e.g:
array_id = getenv('SLURM_SUBMIT_DIR'); % matlab
array_id = os.getenv('SLURM_SUBMIT_DIR') # python
Hebbe script example
- Submitted with
sbatch --array=0-50:5 diffusion.sh
#!/bin/bash #SBATCH -A C3SE2017-1-2 #SBATCH -n 40 -t 2-00:00:00 module load intel/2017a # Set up new folder, copy the input file there temperature=$SLURM_ARRAY_TASK_ID dir=temp_$temperature mkdir $dir; cd $dir cp $SNIC_NOBACKUP/base_input.in input.in # Set the temperature in the input file: sed -i 's/TEMPERATURE_PLACEHOLDER/$temperature' input.in mpirun $SNIC_NOBACKUP/software/my_md_tool -f input.in
Here, the array index is used directly as input. It if turns out that 50 degrees was insufficient, then we could do another run:
sbatch --array=55-80:5 diffusion.sh
Hebbe script example
sbatch -N 3 -J residual_stress run_oofem.sh
#!/bin/bash #SBATCH -A C3SE507-15-6 -p mob #SBATCH --ntasks-per-node=20 #SBATCH -t 15:00:00 #SBATCH --gres=ptmpdir:1 module load intel/2017a PETSc cp $SLURM_JOB_NAME.in $TMPDIR cd $TMPDIR mkdir $SLURM_SUBMIT_DIR/$SLURM_JOB_NAME while sleep 1h; do rsync *.vtu $SLURM_SUBMIT_DIR/$SLURM_JOB_NAME done & LOOPPID=$! mpirun $HOME/bin/oofem -p -f "$SLURM_JOB_NAME.in" kill $LOOPPID rsync *.vtu $SLURM_SUBMIT_DIR/oofem/$SLURM_JOBNAME/
Hebbe interactive use
srun -A SNIC2017-1-2 -n 2 -t 00:30:00 --pty bash -is
you are eventually presented with a shell on the node:
- Useful for debugging a job-script, application problems, long compilations.
- Not useful when there is a long queue (you still have to wait), but can be used with private partitions.
- Supports X11 forwarding via the flag
--x11but this requires you ssh to hebbe with X forwarding (requires an X-server running locally). You start your graphical software via bash, or even directly:
srun -A SNIC2017-1-2 -n 2 -t 00:30:00 --x11 xeyes
- When in Thinlinc, if you wish to run a graphical interactive job with, you first have to first modify the
$DISPLAYvariable by appending
DISPLAY=localhost$DISPLAY srun -A SNIC2017-1-2 -n 2 -t 00:30:00 --x11 xeyes
echo $DISPLAY should read something like
Job command overview, Glenn and Hebbe
sbatch: submit batch jobs
srun: submit interactive jobs
squeue: view the job-queue and the state of jobs in queue
scontrol show job <jobid>: show details about job, including reasons why it's pending
sprio: show all your pending jobs and their priority
scancel: cancel a running or pending job
sinfo: show status for the partitions (queues): how many nodes are free, how many are down, busy, etc.
sacct: show scheduling information about past jobs
projinfo: show the projects you belong to, including monthly allocation and usage
- For details, refer to the -h flag, man pages, or google!
- Why am I queued? jobinfo -u $USER:
- Priority: Waiting for other queued jobs with higher priority.
- Resources: Waiting for sufficient resources to be free.
- AssocGrpCPURunMinutesLimit: We limit how much you can have running at once (<= 100% of 30-day allocation).
- JobHeldAdmin or SUSPENDED state: You are probably over disk quota!
- You can log on to the nodes that your job got allocated by using ssh (from the login node) as long as your job is running. There you can check what your job is doing, using normal Linux commands - ps, top, etc.
- top will show you how much CPU your process is using, how much memory, and more. Tip: press 'H' to make top show all threads separately, for multithreaded programs
- iotop can show you how much your processes are reading and writing on disk
- Performance benchmarking with Allinea Forge, Intel VTune
- Debugging with Allinea Map, gdb
- Running top on your job's nodes:
- System status information for each resource is available through the C3SE homepage:
- Current health status:
- ganglia_url.py JOBID
- The health status page gives an overview of what the node(s) in your job are doing
- Check e.g. memory usage, user, system, and wait CPU utilization, disk writes, etc
- See summary of CPU and memory utilization (only available after job completes):
- Job is keeping all cores busy (looks good)
- Idle node (doing nothing)
- Perhaps you did not use all nodes you asked for?
- Node overusing memory, causing swapping
- Swapping usually means it will run thousands of times slower.
- A job like this you should cancel immediately and ask for more memory next time (
- Node waiting a lot, not great.
- Might be a lot of MPI or disk I/O
Things to keep in mind
- Never run (big or long) jobs on the login node! If you do, we will kill the processes. If you keep doing it, we'll throw you out and block you from logging in for a while! Prepare your job, do tests and check that everything's OK before submitting the job, but don't run the job there!
- Keep an eye on what's going on - use normal Linux tools on the login node and on the allocated nodes to check CPU, memory and network usage, etc. Especially for new jobscripts/codes!
- Think about what you do - if you by mistake copy very large files back and forth you can slow the storage servers or network to a crawl
- We provide support to our users, but not for any and all problems
- We can help you with software installation issues, and recommend compiler flags etc. for optimal performance
- We can install software system-wide if there are many users who need it - but not for one user (unless the installation is simple)
- We don't support your application software or help debugging your model or input files
- C3SE staff are available in our offices, to help with those things that are hard to put into a support request email
- Rooms O5105B, O5110 and O5111 Origo building - Fysikgården 1, one floor up, ring the bell to the right
- We also offer advanced support for things like workflow optimization, advanced help with software development tools or debuggers, workflow automation through scripting, etc.
Getting support - support requests
- If you run into trouble, first figure out what seems to go wrong - does your simulation diverge? Is there a bug in the program? Is there something wrong with your script?
- Do you get any error messages? Look in your manuals, and use Google!
- Check the node health: Did you over-allocate memory until linux killed the program?
- Try to isolate the problem - does it go away if you run a smaller job, does it go away if you use your home directory instead of the local disk on the node?
- Try to create a test case - the smallest and simplest case you can create that reproduces the problem
Getting support - error reports
- Ban the phrase "didn't work", that's already implied by you contacting the support.
- In order to help you, we need as much and as good information as possible:
- What's the job-ID of the failing job?
- What working directory and what job-script?
- What software are you using?
- What's happening - especially error messages?
- Did this work before, or has it never worked?
- Do you have a minimal example?
- No need to attach files; just point us to a directory on the system.
- Where are the files you've used - scripts, logs etc?
- Look at our Getting support page
- Our web page is https://www.c3se.chalmers.se
- Take a look at the tutorial on our web page - it's under Getting Started
- Read up how to use the file system
- Read up on the module system and available software
- Learn a bit of Linux if you don't already know it - no need to be a guru, but you should feel comfortable working in it
- Play around with the system, and ask us if you have questions
- Support cases through https://supr.snic.se/support