Building your own software¶

Modern compilers and development tools are available through the module system. It is highly recommended to always load a toolchain module, even if you are just using GCC, as the system compiler is very dated.

Intel compiler suite¶

The intel compiler toolchain includes:

icpc/icpx - C++ compiler
icc/icx - C compiler
ifort/ifx - FORTRAN
imkl - Intel Math Kernel Library (BLAS, LAPACK, FFT, etc.)
impi - Intel MPI

Exactly how to instruct a build system to use these compilers varies from software to software.

In addition some tools are also available:

VTune - Visual profiling tool
Advisor - Code optimisation tool
Inspector - Memory and thread error detection tool
AMD μProf - Visual profiling tool

all of which you can find in the menu when logging in with over remote graphics.

GCC¶

The foss compiler toolchain includes:

g++ - C++ compiler
gcc - C compiler
gfortran - Fortran compiler
FlexiBLAS - A BLAS and LAPACK wrapper library
OpenMPI

On icelake nodes, OpenBLAS is the default backend of FlexiBLAS, while the default backend is AOCL-BLAS (which is a fork of BLIS) on ZEN4 nodes. Once you load any foss toolchain, you can find the corresponding backend module in the module list.

Using a compiler with a build system¶

Using the buildenv modules will set a bunch of useful environment variables, like CC CXX etc. which will be picked up by most build systems, along with flags that enable higher optimizations which may also be picked up. Otherwise, there is a risk that the very old system compilers are picked up.

E.g.

module load buildenv/default-foss-2024a-CUDA-12.6.0
module load CMake/3.29.3-GCCcore-13.3.0
module load HDF5/1.14.5-gompi-2024a

cd my_software/
mkdir build
cd build
cmake ../

However, some software unfortunately relies on custom made build tools and instructions which makes things more difficult and may require custom solutions.

Compiling with BLAS¶

BLAS (Basic Linear Algebra Subprograms) is a specification that prescribes a set of low-level routines for performing common linear algebra operations. There are multiple different implementations of the routines. For example,

Instead of a real implementation, FlexiBLAS is a wrapper library with runtime exchangeable backends.

To compile a code using BLAS, you have to link necessary libraries from the implementation you are using.

For example,

$ gcc -lopenblas my_code.c -o my_code  # need to load OpenBLAS

will have the code uses OpenBLAS implemetation, while

$ icx -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 my_code.c -o my_code

will use the Intel MKL implementation.

Instead of linking to the libraries directly, FlexiBLAS provides an interface allowing you to switch among BLAS implementations during runtime. Once you load a FlexiBLAS module (which is also a part of foss toolchain), you can check the supported backends with flexiblas list:

$ module load FlexiBLAS/3.4.4-GCC-13.3.0  # or foss/2024a
$ flexiblas list
System-wide:
System-wide (config directory):
 AOCL_MT
   library = libflexiblas_aocl_mt.so
   comment =
 IMKL
   library = libflexiblas_imkl_gnu_thread.so
   comment =
 IMKL_SEQ
   library = libflexiblas_imkl_sequential.so
   comment =
 NETLIB
   library = libflexiblas_netlib.so
   comment =
 BLIS
   library = libflexiblas_blis.so
   comment =
 OPENBLAS
   library = libflexiblas_openblas.so
   comment =
User config:
Host config:
Enviroment config:

The code block above showing that the FlexiBLAS module can support backends of openblas, imkl, blis, etc.

When using flexiblas, you are not linking you code with the real implementation but link to flexiblas library by compiling your code as

$ gcc -lflexiblas my_code.c -o my_code  # need to load FlexiBLAS

To change the backend during runtime, you load the required module and set FLEXIBLAS environment variable at runtime. For example:

FLEXIBLAS=OpenBLAS ./my_code

will have your code using openblas implementation.

More information can be found on the flexiblas github

Threading in BLAS¶

Originally, OpenBLAS launches multiple threads as many as the number of cores on a node. In contrast, BLIS use one thread by default. This behaviour can be changed by setting the value of OMP_NUM_THREADS environment variable (or BLIS_NUM_THREADS if you want to only change the behaviour of BLIS)

On Vera and Alvis, OMP_NUM_THREADS adopt the number of cores per task (-c) if it is specified. If it is not specified, the number of threads depend on the implementation. If you computation highly relies on threading, you will have to specify the number you prefer. This is also crucial for using numpy, which use BLAS significantly underneath. For example, if you allocate 64 cores on Vera ZEN4 nodes (with AOCL-BLAS as default BLAS implementation) and only launch one process on the node. 63 cores will be idling if you do not specify -c 64 to allow AOCL-BLAS to launch 64 threads.

Additional libraries¶

We install many libraries which can greatly simplify Loading modules will set the CPATH and LIBRARY_PATH environment variables, which are usually picked up popular build systems. However, many build systems will fail to respect these general rules, and may require some tweaking to build correctly.

Every library is not installed for every toolchain version. If you are missing some dependency for your software, you can request an installation, or install it locally.

Building CUDA code¶

If you compile code for GPU:s using for example nvcc you must be aware that you need to make sure you not only build for the type of GPU on the system you are compiling with, but also other parts of the resource.

To do this please add the flags -gencode=arch=compute_XX,code=sm_XX for each compute capability XX you want to support to the nvcc commands.

V100: 70
T4: 75
A100: 80
A40: 86
H100: 90

See the CUDA best practices guide for more information.