Introduction slides for Vera
Aims of this seminar¶
- Introducing Chalmers e-Commons/C3SE and NAISS
- Our HPC systems and the basic workflow for using them
- various useful tools and methods
- how we can help you
- This presentation is available on C3SE's web page:
Chalmers e-Commons¶
- C3SE is part of Chalmers e-Commons managing the HPC resources at Chalmers
- Located in the south section of the Origo building on the 6th floor. Map
- The infrastructure team consists of:
- Sverker Holmgren (director of e-Commons)
- Thomas Svedberg
- Mikael Öhman
- Viktor Rehnberg
- Chia-Jung Hsu
- Yunqi Shao
- Dejan Vitlacil
- Leonard Nielsen
- Sahar M. K. Al-Zaidawi
NAISS¶
- We are part of NAISS, National Academic Infrastructure for Supercomputing in Sweden, a government-funded organisation for academic HPC
- NAISS replaced SNIC starting 2023.
- Funded by NAISS and Chalmers
- 5 other NAISS centres
- Lunarc in Lund
- NSC, National Supercomputing Centre, in Linköping
- PDC, ParallellDatorCentrum, at KTH in Stockholm
- Uppmax in Uppsala
- HPC2N, High Performance Computing Centre North, in Umeå
- Much of the infrastructure is shared between centres. For example, our file backups are saved in Umeå and C3SE runs SUPR.
- Similar software stack is used on other centres: Module system, scheduler, containers.
Compute clusters at C3SE¶
- We primarily run Linux-based compute clusters
- We have two production systems: Alvis and Vera
Compute clusters at C3SE¶
- We run Rocky Linux 9, which is a clone of Red Hat Enterprise Linux. Note that:
- Rocky is not Ubuntu!
- Users do NOT have
sudo
rights! - You can not install software using
apt-get
!
Our systems: Vera¶
- Not part of NAISS, only for C3SE members.
- We are upgrading system with Zen4 node.
- Vera hardware
- 768 GB RAM per node standard
- Nodes with 512, 768, 1024, 1536, 2048 GB memory also available (some are private)
- 84 AMD EPYC 9354 ("Zen4") 32 cores @ 3.25 GHz (2 per node)
- +2 nodes with 4 NVidia H100 accelerator cards each.
- 69 Intel(R) Xeon(R) Gold 6338 ("Icelake") 32 cores @ 2.00GHz (2 per node)
- +4 nodes with 4 Nvidia A40 cards each
- +3 nodes with 4 Nvidia A100 cards each
- Zen4 login nodes are equipped with NVIDIA L40S graphics cards.
Vera hardware details¶
- Main partition has
#GPUs | GPUs | FP16 TFLOP/s | FP32 | FP64 | Capability |
---|---|---|---|---|---|
16 | A40 | 37.4 | 37.4 | 0.58 | 8.6 |
12 | A100 | 77.9 | 19.5 | 9.7 | 8.0 |
8 | H100 | 248 | 62 | 30 | 9.0 |
# nodes | type | ||||
84 | Zen4 | ~12 | ~6 | (full 64 cores) | |
63 | Icelake | ~8 | ~4 | (full 64 cores) |
- Theoretical numbers!
Vera zen4 expansion¶
- The default node type has been changed to AMD zen4. It has much larger memory (768GB) comparing to the old skylake nodes (96GB).
- Intel toolchains older than 2024a are not installed on zen4 nodes.
- Rocky Linux 9 is running on zen4 nodes. Icelake nodes are upgrading in these weeks.
- Icelake nodes are still available and requested by the same constraint
-C ICELAKE
. - Use icelake nodes if you want to use intel CPU. Skylake nodes are still around for a few weeks, but they are retiring soon.
- If you use everything from modules or containers, everything should just work.
Vera zen4 expansion (continue)¶
- The software built on skylake nodes have to be rebuilt on zen4 nodes.
- The software built on skylake nodes should work on icelake nodes.
- Load
buildenv/default-foss-2024a
or similar modules to build your own software.-march=native
is a flag you may want to include if you are not using theCFLAGS
,FFLAGS
, etc from the environment variables. - You can use AMD clang/flang by loading compatible AOCC module, for
example,
AOCC/4.2.0-GCCcore-13.3.0
for foss-2024a toolchain. - Clang is also available in Clang modules, for example,
Clang/16.0.6-GCCcore-13.3.0
. - If you are willing to use icelake nodes, you had better to request a compute node for building your software (via portal or srun)
Vera zen4 expansion (continue)¶
- BLIS is the default BLAS implementation on zen4 nodes while OpenBLAS is the default implementation on icelake nodes.
- You can select your BLAS implementation by setting
FLEXIBLAS
environment variables , for example,FLEXIBLAS=OpenBLAS <your executable>
(remember to load proper modules) - VASP users can refer the VASP page.
Our systems: Alvis¶
- NAISS resource dedicated to AI/ML research
- Consists of nodes accelerated with multiple Nvidia GPUs each
- Alvis went in production in three phases:
- Phase 1A: 44 V100 (Skylake CPUs)
- Phase 1B: 160 T4 (Skylake CPUs)
- Phase 2: 336 A100, 348 A40 (Icelake CPUs)
- Login nodes:
alvis1
has 4 T4 GPUs for testing, development (Skylake CPUs)alvis2
is primarily a data transfer node (Icelake CPUs)
- Node details: https://www.c3se.chalmers.se/about/Alvis/
- You should apply for a project if you are working on AI/ML.
Our systems: Cephyr.¶
- Funded by NAISS and Chalmers.
- Center storage running CephFS
- Total storage area of 2PiB
- Also used for Swedish Science Cloud and dCache @ Swestore
- Details: https://www.c3se.chalmers.se/about/Cephyr/
Our systems: Mimer¶
- Funded by WASP via NAISS as part of Alvis.
- Center storage running WekaIO
- Very fast storage for Alvis.
- 634 TB flash storage
- 6860 TB bulk
- Details: https://www.c3se.chalmers.se/about/Mimer/
Available NAISS resources¶
-
Senior researchers at Swedish universities are eligible to apply for NAISS projects on other centres; they may have more time or specialised hardware that suits you; e.g. GPU, large memory nodes, sensitive data.
-
We are also part of:
- Swedish Science Cloud (SSC)
- Swestore
-
You can find information about NAISS resources on https://supr.naiss.se and https://www.naiss.se
- Contact the support if you are unsure about what you should apply for.
Available local resources¶
- All Chalmers departments are allocated a chunk of computing time on Vera
- Each department chooses how they want to manage their allocations
- Not all departments have chosen to request it
For a list of PIs with allocations, you can check this list.
Getting access¶
- https://supr.naiss.se is the platform we use for all resources
- To get access, you must do the following in SUPR:
- Join/apply for a project
- Accept the user agreement
- Send an account request for the resource you wish to use (only available after joining a project)
- Wait ~1 working day (we manually create your account)
- Having a CID is not sufficient.
- We will reactivate CIDs or grant a new CID if you don't have one.
- There are no cluster-specific passwords; you log in with your normal CID and password (or with a SSH-key if you choose to set one up).
- https://www.c3se.chalmers.se/documentation/first_time_users/
Working on an HPC cluster¶
- On your workstation, you are the only user - on the cluster, there are many users at the same time
- You access the cluster through a login node - it's the only machine(s) in the cluster you can access directly
- There's a queuing system/scheduler which starts and stops jobs and balances demand for resources
- The scheduler will start your script on the compute nodes
- The compute nodes are mostly identical (except for memory and GPUs) and share storage systems and network
Working on an HPC cluster¶
- Users belong to one or more projects, which have monthly allocations of core-hours (rolling 30-day window)
- Each project has a Principal Investigator (PI) who applies for an allocation
- The PI decides who can use their projects.
- All projects are managed through SUPR
- You need to have a PI and be part of a project to run jobs
- We count the core-hours your jobs use
- The core-hour usage for the past 720 hours influences your job's priority.
Vera compute cluster¶
Connecting¶
- Vera can be accessed in three ways:
- SSH
- Open OnDemand portal
- ThinLinc
- Vera users need to set up Chalmers VPN (L2TP recommended) to connect from outside Chalmers or GU.
- This applies to all services: SSH, file transfer, Open OnDemand portal
Connecting via SSH¶
- Use your CID as user name and log in with ssh:
- Vera:
ssh CID@vera1.c3se.chalmers.se
orCID@vera2.c3se.chalmers.se
(accessible within Chalmers and GU network) - Authenticate yourself with your password or set up aliases and ssh keys (strongly recommended for all)
- On Linux or Mac the ssh client is typically already installed. On windows we recommend WSL or Putty.
- WSL is a good approach if you want to build some Linux experience.
Connecting via Open OnDemand portal¶
- Open OnDemand portal https://vera.c3se.chalmers.se
- Browse files, check disk and file quota, check project usage
- Launch interactive apps on compute nodes
- Desktop
- Jupyter notebooks
- MATLAB proxy
- RStudio
- VSCode
- Launch apps on log-in nodes
- TensorBoard
- Desktop
- Environments can be customized by copying the examples under
/apps/portal/
to your home dir~/portal/
. - See our documentation for more
Connecting via ThinLinc¶
- We also offer graphical login using ThinLinc, where you get a linux desktop on the cluster. This is mainly intended for graphics-intensive pre- and post processing. See Remote graphics for more info.
- Using the ThinLinc client is recommended.
- Can also use the web:
- This is still a shared login node.
Desktop¶
- Similar environment in ThinLinc and OnDemand.
Using the shell¶
- At the prompt ($), simply type the command (optionally followed by arguments to the command). E.g:
- The working directory is normally the "current point of focus" for many commands
- A few basic shell commands are
ls
, list files in working directorypwd
, print current working directory ("where am I")cd directory_name
, change working directorycp src_file dst_file
, copy a filerm file
, delete a file (there is no undelete!)mv nameA nameB
, rename nameA to nameBmkdir dirname
, create a directory See alsogrep, find, less, chgrp, chmod
man-pages¶
- man provides documentation for most of the commands available on the system, e.g.
- man ssh, to show the man-page for the ssh command
- man -k word, to list available man-pages containing word in the title
- man man, to show the man-page for the man command
- To navigate within the man-pages (similar to the less command) space - to scroll down one
screen page
b
- to scroll up one screen pageq
- to quit from the current man page/
- search (type in word, enter)n
- find next search match (N for reverse)h
- to get further help (how to search the man page etc)
Filesystem and Data Storage¶
- Home directories
$HOME = /cephyr/users/<CID>/Vera
$HOME = /cephyr/users/<CID>/Alvis
- The home directory is backed up every night
- We use quota to limit storage use
- Run
C3SE_quota
to check your current quota on all your active storage areas in shell - You can also check quota in Open OnDemand portal
- Quota limits (Cephyr):
- User home directory (
/cephyr/users/<CID>/
)- 30GB, 60k files
- Use
where-are-my-files
to find file quota on Cephyr. - Use
dust
ordust -f
to check disk space usage and file quota.
- User home directory (
- See also Filesystem
Filesystem and Data Storage¶
- If you need to store more data, you can apply for a storage project
- Try to avoid lots of small files: sqlite or HDF5 are easy to use!
- Data deletion policy for storage projects.
- See NAISS UA for user data deletion.
Preparing job¶
- Prepare your jobs on login nodes
- Login nodes are shared resources!
- Install your own software (if needed)
- Transfer input files to the cluster
- Prepare batch scripts for performing your analysis
Software - Modules¶
- Almost all software is available only after loading corresponding modules https://www.c3se.chalmers.se/documentation/module_system/
- To load one or more modules, use the command
module load module-name [module-name ...]
- Loading a module expands your current
PATH
,LD_LIBRARY_PATH
,PYTHONPATH
etc. making the software available. - Example:
$ mathematica --version
-bash: mathematica: command not found
$ module load Mathematica/13.0.0
$ mathematica --version
13.0
- Don't load modules in your
~/.bashrc
. You will break things like the desktop. Load modules in each jobscript to make them self contained, otherwise it's impossible for us to offer support.
Software - Modules (continued)¶
- Toolchains
- Compilers, C, C++, and FORTRAN compilers such as ifort, gcc, clang
- MPI-implementations, such as Intel-mpi, OpenMPI
- Math Kernel Libraries, optimised BLAS and FFT (and more) e.g. mkl
- Large number of software modules; Python (+a lot of addons such as NumPy, SciPy etc), ANSYS, COMSOL, Gaussian, Gromacs, MATLAB, OpenFoam, R, StarCCM, etc.
- module load module-name - load a module
- module list - list currently loaded modules
- module keyword string - search keyword string in modules (e.g. extensions)
- module spider module-name - search for module
- module purge - unloads all current modules
- module show module-name - shows the module content
- module avail - show available modules (for currently loaded toolchain only)
- module unload module-name - unloads a module
Software - Modules (continued)¶
- A lot of software available in modules.
- Commercial software and libraries; MATLAB, Mathematica, Schrodinger, CUDA, Nsight Compute and much more.
- Tools, compilers, MPI, math libraries, etc.
- Major version update of all software versions is done twice yearly
- Overview of recent toolchains
- Mixing toolchains versions will not work
- It is recommended to pin the version you are using and upgrade when needed
- Popular top level applications such as TensorFlow and PyTorch may be updated within a single toolchain version.
Software - Python¶
- We install the fundamental Python packages for HPC, such as NumPy, SciPy, PyTorch, optimised for our systems
- We can also install Python packages if there will be several users.
- We provide
virtualenv
,apptainer
,conda
(least preferable) so you can install your own Python packages locally. https://www.c3se.chalmers.se/documentation/module_system/python/ - Avoid using the old OS installed Python.
- Avoid installing python packages directly into home directory with
pip install --user
. They will leak into containers and other environments, and will quickly eat up your quota.
Software - Installing software¶
- You are ultimately responsible for having the software you need
- You are also responsible for having any required licenses
- We're happy to help you installing software - ask us if you're unsure of what compiler or maths library to use, for example.
- We can also install software centrally, if there will be multiple users, or if the software requires special permissions. You must supply us with the installation material (if not openly available).
- If the software already has configurations in EasyBuild then installations can be very quick.
- You can run your own containers.
Software - Building software¶
- Use modules for build tools things
- buildenv modules, e.g.
buildenv/default-foss-2023a-CUDA-12.1.1
provides a build environment with GCC, OpenMPI, OpenBLAS/BLIS, CUDA - many important tools:
CMake
,Autotools
,git
, ... - and much more
Python
,Perl
,Rust
, ...
- buildenv modules, e.g.
- You can link against libraries from the module tree. Modules set
LIBRARY_PATH
and other environment variables and more which can often be automatically picked up by good build systems. - Poor build tools can often be "nudged" to find the libraries with
configuration flags like
--with-python=$EBROOTPYTHON
- You can only install software in your allocated disk spaces (nice build tools
allows you to specify a
--prefix=path_to_local_install
)- Many "installation instructions" online falsely suggest you should use
sudo
to perform steps. They are wrong.
- Many "installation instructions" online falsely suggest you should use
- Need a common dependency? You can request we install it as another module.
Software - Installing binary (pre-compiled) software¶
- Common problem is that software requires a newer glibc version. This is tied
to the OS and can't be upgraded.
- You can use Apptainer (Singularity) container to wrap this for your software.
- Make sure to use binaries that are compiled optimised for the hardware.
- Alvis and Vera (Zen4/Icelake) support up to AVX512.
- Difference can be huge. Example: Compared to our optimised NumPy builds, a generic x86 version is up to ~9x slower on Vera.
- Support for hardware like the Infiniband network and GPUDirect can also be critical for performance.
- AVX512 (2015) > AVX2 (2008) > AVX (2011) > SSE (1999-2006) > Generic instructions (<1996).
-march=native
optimizes your code for the current CPU model.- Build your software separately for Zen4 and Icelake.
- Some intel toolchains are not installed on Zen4 nodes.
Software - Building containers¶
- Simply
apptainer build my.sif my.def
from a given definition file, e.g:
Bootstrap: docker
From: continuumio/miniconda3:4.12.0
%files
requirements.txt
%post
/opt/conda/bin/conda install -y --file requirements.txt
- You can boostrap much faster from existing containers (even your own) if you want to add things:
Bootstrap: localimage
From: path/to/existing/container.sif
%post
/opt/conda/bin/conda install -y matplotlib
- The final image is a small, portable single file.
- More things can be added to the definition file e.g.
%environment
Submitting job¶
- Submit job script to queuing system (sbatch)
- You'll have to specify project account, number of cores, wall-time, GPU/large memory
- Job is placed in a queue ordered by priority (influenced by usage, project size, job size)
Your projects¶
projinfo
lists your projects and current usage.projinfo -D
breaks down usage day-by-day (up to 30 days back).
Project Used[h] Allocated[h] Queue
User
-------------------------------------------------------
C3SE2017-1-8 15227.88* 10000 vera
razanica 10807.00*
kjellm 2176.64*
robina 2035.88* <-- star means we are over 100% usage
dawu 150.59* which means this project has lowered priority
framby 40.76*
-------------------------------------------------------
C3SE507-15-6 9035.27 28000 mob
knutan 5298.46
robina 3519.03
kjellm 210.91
ohmanm 4.84
Viewing available nodes¶
jobinfo -p vera
command shows the current state of nodes in the main partition
Node type usage on main partition:
TYPE ALLOCATED IDLE OFFLINE TOTAL
ICELAKE,MEM1024 1 3 0 4
ICELAKE,MEM512 41 16 0 57
ZEN4,MEM1536 0 2 0 2
ZEN4,MEM768 1 97 0 98
(...the nodes below are retiring)
SKYLAKE,MEM192 15 2 0 17
SKYLAKE,MEM384 0 6 0 6
SKYLAKE,MEM768 2 0 0 2
SKYLAKE,MEM96,25G 20 0 0 20
SKYLAKE,MEM96 170 5 4 179
Total GPU usage:
TYPE ALLOCATED IDLE OFFLINE TOTAL
A40 5 7 4 16
A100 8 4 0 12
H100 0 0 8 8
(...the nodes below are retiring)
V100 4 4 0 8
Submitting jobs¶
- On compute clusters jobs must be submitted to a queuing system that starts
your jobs on the compute nodes:
sbatch <arguments> script.sh
- Simulations must NOT run on the login nodes. Prepare your work on the front-end, and then submit it to the cluster
- A job is described by a script (script.sh above) that is passed on to the queuing system by the sbatch command
- Arguments to the queue system can be given in the
script.sh
as well as on the command line - Maximum wall time is 7 days (we might extend it manually in rare occasions)
- Anything long running should use checkpointing of some sort to save partial results.
- When you allocate less than a full node, you are assigned a proportional part of the node's memory and local disk space as well.
- See https://www.c3se.chalmers.se/documentation/submitting_jobs/
Job script example¶
#!/bin/bash
#SBATCH -A C3SE2024-11-05
#SBATCH -p vera
#SBATCH -n 4
#SBATCH -t 2-00:00:00
echo "Hello world"
- Implicitly assigned to ZEN4 nodes
- 4 cores (totally 64 cores on a ZEN4 node)
- 1/8 of memory and 1/8 of disk space of the assigned node will be allocated
Selecting resources on Vera¶
- You can explicitly request either node type with
-C ZEN4
and-C ICELAKE
. - If you don't specify any constraint, you will be automatically assigned
-C ZEN4
. - Icelake nodes have at least 512 GB memory and zen4 nodes have at least 768 GB
memory
-C MEM1024
requests a 1024GB node - 9 (icelake) total (3 private)-C MEM1536
requests a 1536GB node - 2 (zen4) total (1 private)-C MEM2048
requests a 2048GB node - 3 (icelake) total (all private)
- GPUs can be requested by
--gpus-per-node
, the node type is bound--gpus-per-node=A40:4
requests 4 A40 (icelake)--gpus-per-node=A100:2
requests 2 A100 (icelake)--gpus-per-node=H100:1
requests 1 H100 (zen4)
- Don't specify constraints (
-C
) unless you know you need them. -C SKYLAKE
still works if you want to finish your work, but it will be remove in a few weeks.
Job cost on Vera¶
- On Vera, jobs cost based on the number of physical cores they allocate, plus
Type | VRAM | Additional cost |
---|---|---|
A40 | 48GB | 16 |
A100 | 40GB | 48 |
H100 | 96GB | 160 |
- Example: A job using a single A100 (one fourth of icelake node) for 10 hours:
(64/4 + 48) * 10 = 640
core hours - Note: 16, 32, and 64 bit floating point performance differ greatly between these specialized GPUs. Pick the one most efficient for your application.
- Additional running cost is based on the price compared to a CPU node.
- You don't pay any extra for selecting a node with more memory; but you are typically competing for less available hardware.
- GPUs are cheap compared to CPUs in regard to their performance
Job starts¶
- Job starts when requested nodes are available (and it is your turn)
- Automatic environment variables inform MPI how to run (which can run over Infiniband)
- Performs the actions you detailed in your job script as if you were typing them in yourself
Vera script example¶
#!/bin/bash
#SBATCH -A C3SE2024-11-05
#SBATCH -p vera
#SBATCH -n 32
#SBATCH -t 2-00:00:00
module purge
module load SciPy-bundle/2024.05-gfbf-2024a
source /cephyr/NOBACKUP/groups/naiss2024-xx-xxx/myenv/bin/activate
python myscript.py --input=input.txt
- Outputs (if there is any) and log file (slurm-*.out) are written to the same directory
Vera script example (using icelake CPU)¶
#!/bin/bash
#SBATCH -A C3SE2024-11-05
#SBATCH -p vera
#SBATCH -n 64
#SBATCH -C ICELAKE
#SBATCH -t 2-00:00:00
module purge
module load SciPy-bundle/2024.05-gfbf-2024a
source /cephyr/NOBACKUP/groups/naiss2024-xx-xxx/myenv/bin/activate
python myscript.py --input=input.txt
Vera script example (using GPU)¶
#!/bin/bash
#SBATCH -A C3SE2024-11-05
#SBATCH -t 2-00:00:00
#SBATCH --gpus-per-node=A40:2
apptainer exec --nv tensorflow-2.1.0.sif python cat_recognizer.py
More on containers
Using TMPDIR in jobs¶
- You must work on TMPDIR if you do a lot of file I/O
- Parallel TMPDIR is available (using
--gres ptmpdir:1
) - TMPDIR is cleaned of data immediately when the job ends, fails, crashed or runs out of wall time
- You must copy important results back to your persistent storage
Data read/write and TMPDIR¶
$SLURM_SUBMIT_DIR
is defined in jobs, and points to where you submitted your job.$TMPDIR
: local scratch disk on the node(s) of your jobs. Automatically deleted when the job has finished.- When should you use
$TMPDIR
?- The only good reason NOT to use
$TMPDIR
is if your program only loads data in one read operation, processes it, and writes the output.
- The only good reason NOT to use
- It is crucial that you use
$TMPDIR
for jobs that perform intensive file I/O - If you're unsure what your program does: investigate it, or use
$TMPDIR
! - Using
/cephyr/...
or/mimer/...
means the network-attached permanent storage is used. - Using
sbatch --gres=ptmpdir:1
you get a distributed, parallel$TMPDIR
across all nodes in your job. Always recommended for multi-node jobs that use $TMPDIR.
Vera script example¶
#!/bin/bash
#SBATCH -A C3SE2024-11-05 -p vera
#SBATCH -C ZEN4
#SBATCH -n 64
#SBATCH -t 2-00:00:00
#SBATCH --gres=ptmpdir:1
module load ABAQUS/2023-hotfix-2324 intel/2023a
cp train_break.inp $TMPDIR
cd $TMPDIR
abaqus cpus=$SLURM_NTASKS mp_mode=mpi job=train_break
cp train_break.odb $SLURM_SUBMIT_DIR
Vera script example (using GPU)¶
#!/bin/bash
#SBATCH -A C3SE2024-11-05 -p vera
#SBATCH -t 2-00:00:00
#SBATCH --gpus-per-node=H100:1
unzip many_tiny_files_dataset.zip -d $TMPDIR/
apptainer exec --nv ~/tensorflow-2.1.0.sif python trainer.py --training_input=$TMPDIR/
Vera script example (job array)¶
- Submitted with
sbatch --array=0-99 wind_turbine.sh
#!/bin/bash
#SBATCH -A C3SE2024-11-05
#SBATCH -n 1
#SBATCH -C "ICELAKE|ZEN4"
#SBATCH -t 15:00:00
#SBATCH --mail-user=zapp.brannigan@chalmers.se --mail-type=end
module load MATLAB
cp wind_load_$SLURM_ARRAY_TASK_ID.mat $TMPDIR/wind_load.mat
cp wind_turbine.m $TMPDIR
cd $TMPDIR
RunMatlab.sh -f wind_turbine.m
cp out.mat $SLURM_SUBMIT_DIR/out_$SLURM_ARRAY_TASK_ID.mat
- Environment variables like
$SLURM_ARRAY_TASK_ID
can also be accessed from within all programming languages, e.g:
Vera script example (job array)¶
- Submitted with
sbatch --array=0-50:5 diffusion.sh
#!/bin/bash
#SBATCH -A C3SE2024-11-05
#SBATCH -C ICELAKE
#SBATCH -n 128 -t 2-00:00:00
module load intel/2023a
## Set up new folder, copy the input file there
temperature=$SLURM_ARRAY_TASK_ID
dir=temp_$temperature
mkdir $dir; cd $dir
cp $HOME/base_input.in input.in
## Set the temperature in the input file:
sed -i 's/TEMPERATURE_PLACEHOLDER/$temperature' input.in
mpirun $HOME/software/my_md_tool -f input.in
Here, the array index is used directly as input. If it turns out that 50 degrees was insufficient, then we could do another run:
Vera script example (more slurm environment variables)¶
Submitted with: sbatch run_oofem.sh
#!/bin/bash
#SBATCH -A C3SE507-15-6 -p mob
#SBATCH --ntasks-per-node=32 -N 3
#SBATCH -J residual_stress
#SBATCH -t 6-00:00:00
#SBATCH --gres=ptmpdir:1
module load PETSc
cp $SLURM_JOB_NAME.in $TMPDIR
cd $TMPDIR
mkdir $SLURM_SUBMIT_DIR/$SLURM_JOB_NAME
while sleep 1h; do
rsync -a *.vtu $SLURM_SUBMIT_DIR/$SLURM_JOB_NAME
done &
LOOPPID=$!
mpirun $HOME/bin/oofem -p -f "$SLURM_JOB_NAME.in"
kill $LOOPPID
rsync -a *.vtu $SLURM_SUBMIT_DIR/$SLURM_JOBNAME/
After the job ends¶
- You can also do light post-processing directly on our systems
- Graphical pre/post-processing can be done via ThinLinc or the Vera portal.
Interactive use¶
You are allowed to use the ThinLinc machines for light/moderate tasks that require interactive input. If you need all cores, or load for a extended duration, you must run on the nodes:
you are eventually presented with a shell on the node:
- Useful for debugging a job-script, application problems, extremely long compilations.
- Not useful when there is a long queue (you still have to wait), but can be used with private partitions.
srun
interactive jobs will be aborted if the login node needs to be rebooted or loss of internet connectivity. Prefer always using the portal.
Job command overview¶
sbatch
: submit batch jobssrun
: submit interactive jobsjobinfo
,squeue
: view the job-queue and the state of jobs in queuescontrol show job <jobid>
: show details about job, including reasons why it's pendingsprio
: show all your pending jobs and their priorityscancel
: cancel a running or pending jobsinfo
: show status for the partitions (queues): how many nodes are free, how many are down, busy, etc.sacct
: show scheduling information about past jobsprojinfo
: shows the projects you belong to, including monthly allocation and usage- For details, refer to the -h flag, man pages, or google.
Job monitoring¶
- Why am I queued?
jobinfo -u $USER
:- Priority: Waiting for other queued jobs with a higher priority.
- Resources: Waiting for sufficient resources to be free.
- AssocGrpBillingRunMinutes: We limit how much you can have running at once
(<= 100% of 30-day allocation * 0.5^x where x is the number of stars in
projinfo
).
- You can log on to the nodes that your job got allocated by using ssh (from the
login node) as long as your job is running. There you can check what your job
is doing, using normal Linux commands - ps, top, etc.
- top will show you how much CPU your process is using, how much memory, and more. Tip: press 'H' to make top show all threads separately, for multithreaded programs
- iotop can show you how much your processes are reading and writing on disk
- Performance benchmarking with e.g. Nvidia Nsight compute
- Debugging with gdb, Address Sanitizer, or Valgrind
Job monitoring¶
- Running top on your job's nodes:
System monitoring¶
job_stats.py JOBID
is essential.- Check e.g. memory usage, user, system, and wait CPU utilisation, disk usage, etc
sinfo -Rl
command shows how many nodes are down for repair.
System monitoring¶
- The ideal job, high CPU utilisation and no disk I/O
System monitoring¶
- Looks like something tried to use 2 nodes incorrectly.
- Requested 64 cores but only used half of them. One node was just idling.
System monitoring¶
- Extremely inefficient I/O. All cores are waiting for each other finishing writing most of time (system cpu usage).
Profiling¶
- With the right tools you can easily dive into where your code bottlenecks are,
we recommend:
- TensorFlow: TensorBoard
- PyTorch:
torch.profiler
(possibly with TensorBoard) - Python: Scalene
- Compiled CPU or GPU code: NVIDIA Nsight Systems
- MATLAB: Built in profiler
- Tools can be used interactively on compute nodes with OpenOnDemand portals!
Things to keep in mind¶
- Never run (big or long) jobs on the login node! If you do, we will kill the processes. If you keep doing this, we will block your access temporarily. Prepare your job, do tests and check that everything's OK before submitting the job, but don't run the job there!
- The Open OnDemand portals allow interactive desktop and web apps directly on the compute nodes. Use this for heavy interactive work.
- If your home dir runs out of quota or you put to much experimental stuff in
your
.bashrc
file, expect things like the desktop session to break. Many support tickets are answered by simply clearing these out. - Keep an eye on what's going on - use normal Linux tools on the login node and
on the allocated nodes to check CPU, memory and network usage, etc. Especially
for new jobscripts/codes! Do check
job_stats.py
! - Think about what you do - if you by mistake copy very large files back and forth you can slow the storage servers or network to a crawl
Getting support¶
- We ask all students to pass Introduction to computer clusters but all users who wishes can attend this online self learning course.
- Chalmers PhD students can also attend "Introduction to digital resources in research" (GTS course) to have an overview of research tools and resources.
- Students should first speak to their supervisor for support
- We provide support to our users, but not for any and all problems
- We can help you with software installation issues, and recommend compiler flags etc. for optimal performance
- We can install software system-wide if there are many users who need it - but not for one user (unless the installation is simple)
- We don't support your application software or help debugging your code/model or prepare your input files
Getting support¶
- Staff are available in our offices, to help with those things that are hard to put into a support request email (book a time in advance please)
- Origo building - Fysikgården 4, one floor up, ring the bell
- We also offer advanced support for things like performance optimisation, advanced help with software development tools or debuggers, workflow automation through scripting, etc.
Getting support - support requests¶
- If you run into trouble, first figure out what seems to go wrong. Use the
following as a checklist:
- make sure you simply aren't over disk quota with
C3SE_quota
- something wrong with your job script or input file?
- does your simulation diverge?
- is there a bug in the program?
- any error messages? Look in your manuals, and use Google!
- check the node health: Did you over-allocate memory until Linux killed the program?
- Try to isolate the problem - does it go away if you run a smaller job? does it go away if you use your home directory instead of the local disk on the node?
- Try to create a test case - the smallest and simplest possible case that reproduces the problem
- make sure you simply aren't over disk quota with
Getting support - error reports¶
- In order to help you, we need as much and as good information as possible:
- What's the job-ID of the failing job?
- What working directory and what job-script?
- What software are you using?
- What's happening - especially error messages?
- Did this work before, or has it never worked?
- Do you have a minimal example?
- No need to attach files; just point us to a directory on the system.
- Where are the files you've used - scripts, logs etc?
- Look at our Getting support page
In summary¶
- Our web page is https://www.c3se.chalmers.se
- Read up how to use the file system
- Read up on the module system and available software
- Learn a bit of Linux if you don't already know it - no need to be a guru, but you should feel comfortable working in it
- Play around with the system, and ask us if you have questions
- Please use the SUPR support form - it provides additional automatic project and user information that we need. Please always prefer this to sending emails directly.
Outlook¶
- We have more GPUs, but are still often not utilized that much.
- Skylake nodes will be retired soon.