Building your own software¶
Modern compilers and development tools are available through the module system. It is highly recommended to always load a toolchain module, even if you are just using GCC, as the system compiler is very dated.
Intel compiler suite¶
The intel
compiler toolchain includes:
- icpc/icpx - C++ compiler
- icc/icx - C compiler
- ifort/ifx - FORTRAN
- imkl - Intel Math Kernel Library (BLAS, LAPACK, FFT, etc.)
- impi - Intel MPI
Exactly how to instruct a build system to use these compilers varies from software to software.
In addition some tools are also available:
- VTune - Visual profiling tool
- Advisor - Code optimisation tool
- Inspector - Memory and thread error detection tool
- AMD μProf - Visual profiling tool
all of which you can find in the menu when logging in with over remote graphics.
GCC¶
The foss
compiler toolchain includes:
- g++ - C++ compiler
- gcc - C compiler
- gfortran - Fortran compiler
- FlexiBLAS - A BLAS and LAPACK wrapper library
- OpenMPI
On icelake nodes, OpenBLAS
is the default backend of FlexiBLAS
, while the
default backend is AOCL-BLAS
(which is a fork of BLIS
) on ZEN4 nodes.
Once you load any foss
toolchain, you can find the corresponding backend
module in the module list
.
Using a compiler with a build system¶
Using the buildenv
modules will set a bunch of useful environment variables,
like CC
CXX
etc. which will be picked up by most build systems, along with
flags that enable higher optimizations which may also be picked up. Otherwise,
there is a risk that the very old system compilers are picked up.
E.g.
module load buildenv/default-foss-2024a-CUDA-12.6.0
module load CMake/3.29.3-GCCcore-13.3.0
module load HDF5/1.14.5-gompi-2024a
cd my_software/
mkdir build
cd build
cmake ../
However, some software unfortunately relies on custom made build tools and instructions which makes things more difficult and may require custom solutions.
Compiling with BLAS¶
BLAS (Basic Linear Algebra Subprograms) is a specification that prescribes a set of low-level routines for performing common linear algebra operations. There are multiple different implementations of the routines. For example,
Instead of a real implementation, FlexiBLAS is a wrapper library with runtime exchangeable backends.
To compile a code using BLAS, you have to link necessary libraries from the implementation you are using.
For example,
will have the code usesOpenBLAS
implemetation, while
will use the Intel MKL implementation.
Instead of linking to the libraries directly, FlexiBLAS
provides an
interface allowing you to switch among BLAS implementations during runtime.
Once you load a FlexiBLAS
module (which is also a part of foss
toolchain),
you can check the supported backends with flexiblas list
:
$ module load FlexiBLAS/3.4.4-GCC-13.3.0 # or foss/2024a
$ flexiblas list
System-wide:
System-wide (config directory):
AOCL_MT
library = libflexiblas_aocl_mt.so
comment =
IMKL
library = libflexiblas_imkl_gnu_thread.so
comment =
IMKL_SEQ
library = libflexiblas_imkl_sequential.so
comment =
NETLIB
library = libflexiblas_netlib.so
comment =
BLIS
library = libflexiblas_blis.so
comment =
OPENBLAS
library = libflexiblas_openblas.so
comment =
User config:
Host config:
Enviroment config:
The code block above showing that the FlexiBLAS module can support backends of openblas, imkl, blis, etc.
When using flexiblas, you are not linking you code with the real implementation but link to flexiblas library by compiling your code as
To change the backend during runtime, you load the required module and set
FLEXIBLAS
environment variable at runtime. For example:
More information can be found on the flexiblas github
Threading in BLAS¶
Originally, OpenBLAS
launches multiple threads as many as the number of
cores on a node. In contrast, BLIS
use one thread by default. This
behaviour can be changed by setting the value of OMP_NUM_THREADS
environment variable (or BLIS_NUM_THREADS
if you want to only change the
behaviour of BLIS
)
On Vera and Alvis, OMP_NUM_THREADS
adopt the number of cores per task (-c
)
if it is specified. If it is not specified, the number of threads depend on
the implementation. If you computation highly relies on threading, you will
have to specify the number you prefer. This is also crucial for using numpy
,
which use BLAS significantly underneath. For example, if you allocate 64 cores
on Vera ZEN4 nodes (with AOCL-BLAS
as default BLAS implementation) and only
launch one process on the node. 63 cores will be idling if you do not specify
-c 64
to allow AOCL-BLAS
to launch 64 threads.
Additional libraries¶
We install many libraries which can greatly simplify Loading modules will set
the CPATH
and LIBRARY_PATH
environment variables, which are usually picked
up popular build systems. However, many build systems will fail to respect these
general rules, and may require some tweaking to build correctly.
Every library is not installed for every toolchain version. If you are missing some dependency for your software, you can request an installation, or install it locally.
Building CUDA code¶
If you compile code for GPU:s using for example nvcc
you must be aware that
you need to make sure you not only build for the type of GPU on the system you
are compiling with, but also other parts of the resource.
To do this please add the flags -gencode=arch=compute_XX,code=sm_XX
for each
compute capability XX
you want to support to the nvcc
commands.
- V100:
70
- T4:
75
- A100:
80
- A40:
86
- H100:
90
See the CUDA best practices guide for more information.