Major Vera expansion with AMD Zen 4 nodes¶

Vera has been expanded with 102 new compute nodes and 2 new login nodes, all running AMD Zen4 CPUs. This expansion replaces and expands the capacity of the old Skylake nodes now being decommisioned.

Hardware description¶

New hardware in vera partition:

#nodes	CPU	#cores/node	RAM (GB)	TMPDIR (GB)	GPUS
83	Zen4	64	768	845
1	Zen4	64	1536	845
2	Zen4	64	1536	845	4xH100

Some additional nodes are also part of private partitions and not listed above. Find complete hardware listing on the see Vera hardware.

Note that all nodes have much more RAM than the old Skylake based nodes!

Skylake nodes will be decomissioned¶

The oldest nodes of Vera has run out of their extended service contract and these old Intel Skylake machines will be decomissioned over the coming weeks. This includes the V100 and T4 GPU nodes in Vera. This also means that all nodes in Vera will have 64 cores and at least 512GB of RAM. Please do not attempt to queue up more things on these old Skylake nodes.

During the transition, an old Skylake login node will be kept available under the name vera3.

Starting from 2025-01-30 the login nodes will change to:

New Zen4: vera1.c3se.chalmers.se https://vera1.c3se.chalmers.se:300
New Zen4: vera2.c3se.chalmers.se https://vera2.c3se.chalmers.se:300

Updated operating system¶

For improved hardware and driver support, the operating system will be updated to Rocky Linux 9. The new Zen4 nodes will run Rocky 9 from the start, and the Intel Icelake nodes will be updated in the coming weeks. This means a fully new module tree for all nodes (once updated).

Due to hardware and OS compatibility, we will not build old toolchains.

We will build all the most commonly and recently used software. If you are missing your favourite software or library, please contact our support.

Containers are unaffected.

Selecting CPU architecture¶

For GPU jobs, the corresponding CPU type is implied (see hardware), and there is no need to specify anything other than the GPU model like normal, e.g.

#SBATCH --gpus-per-node=A40:1   # will run on Icelake
#SBATCH --gpus-per-node=H100:1  # will run on Zen 4

For CPU jobs, you can explicitly select Zen4 or Icelake architecture using

#SBATCH -C ZEN4

or

#SBATCH -C ICELAKE

If you do not specify any constraint, it will automatically be ZEN4.

Use jobinfo to see current queue and to see if some node types are less congested.

MPI¶

For some workloads, such as OpenFOAM, the different cache structure of Zen4 GPU:s can affect performance negatively. It is therefore advisable to try using only half of the allocated nodes in some cases. This can be accomplished when using srun with

#SBATCH -n 32 -c 2

to set the configuration to 32 tasks and 2 CPU:s per task, for 64 cores in total.

Compiling software¶

All the modules we provide are built and optimized for each architecture (Zen4 and Icelake). If you wish to build your own software optimized for Zen4, GCC and Clang both offer good performance. In addition, you can use AMD's fork of Clang, AOCC. Clang and AOCC can also offer a better debugging experience than GCC. The intel compilers are not advised.

You should load the module 'buildenv' in order to compile your software, along with your preferred compiler, e.g:

ml AOCC/4.2.0-GCCcore-13.3.0 buildenv/default-foss-2024a

or

ml Clang/16.0.6-GCCcore-13.3.0 buildenv/default-foss-2024a

Note that AOCC also uses the name 'clang' for its compiler executable. Also, be aware that by default FlexiBLAS will use the BLIS backend on Zen4, or for the GCC-13.3.0 toolchain the AMD-optimized AOCL_MT fork of BLIS, as these generally offer performance superior to that of OpenBLAS and MKL. If you wish to use other backends, you can use the command

flexiblas list

to see the name of the library files for each backend. Then load the backend that you wish to use, and run your software with

FLEXIBLAS=name_of_library.so name_of_your_executable

This will switch the backend with no need to recompile. Note also that when using BLIS backends, you will want to submit your jobs using a configuration like

#SBATCH -n1 -c16

for a 16-core single-task job. This will ensure that OMP_NUM_THREADS is set correctly. You can set OMP_NUM_THREADS to the number of cores to use yourself.

Building for Icelake¶

As Icelake and Zen4 CPUs differ, software optimized with -march=native on one type will not work on the other. Zen4 software can be compiled on the login nodes, but for Icelake optimized software you can

Use the Vera portal https://vera.c3se.chalmers.se and get a desktop session on a Icelake node.
Run an interactive job against a Icelake compute node srun -A YOUR_PROJECT -C ICELAKE -n 16 --pty bash and compile your software there.

In general the intel toolchain and MKL offers the best performance on Icelake, e.g. ml buildenv/default-intel-2024a