Major Vera expansion with AMD Zen 4 nodes¶
Vera has been expanded with 102 new compute nodes and 2 new login nodes, all running AMD Zen4 CPUs. This expansion replaces and expands the capacity of the old Skylake nodes now being decommisioned.
Hardware description¶
New hardware in vera
partition:
#nodes | CPU | #cores/node | RAM (GB) | TMPDIR (GB) | GPUS |
---|---|---|---|---|---|
83 | Zen4 | 64 | 768 | 845 | |
1 | Zen4 | 64 | 1536 | 845 | |
2 | Zen4 | 64 | 1536 | 845 | 4xH100 |
Some additional nodes are also part of private partitions and not listed above. Find complete hardware listing on the see Vera hardware.
Note that all nodes have much more RAM than the old Skylake based nodes!
Skylake nodes will be decomissioned¶
The oldest nodes of Vera has run out of their extended service contract and these old Intel Skylake machines will be decomissioned over the coming weeks. This includes the V100 and T4 GPU nodes in Vera. This also means that all nodes in Vera will have 64 cores and at least 512GB of RAM. Please do not attempt to queue up more things on these old Skylake nodes.
During the transition, the old Skylake login nodes will be kept available under the names vera3 and vera4.
Starting from 2025-01-30 the login nodes will change to:
- New Zen4:
vera1.c3se.chalmers.se
https://vera1.c3se.chalmers.se:300 - New Zen4:
vera2.c3se.chalmers.se
https://vera2.c3se.chalmers.se:300 - Old skylake:
vera3.c3se.chalmers.se
https://vera3.c3se.chalmers.se:300 - Old skylake:
vera4.c3se.chalmers.se
https://vera4.c3se.chalmers.se:300
Updated operating system¶
For improved hardware and driver support, the operating system will be updated to Rocky Linux 9. The new Zen4 nodes will run Rocky 9 from the start, and the Intel Icelake nodes will be updated in the coming weeks. This means a fully new module tree for all nodes (once updated).
Due to hardware and OS compatibility, we will not build old toolchains.
We will build all the most commonly and recently used software. If you are missing your favourite software or library, please contact our support.
Containers are unaffected.
Selecting CPU architecture¶
For GPU jobs, the corresponding CPU type is implied (see hardware), and there is no need to specify anything other than the GPU model like normal, e.g.
#SBATCH --gpus-per-node=A40:1 # will run on Icelake
#SBATCH --gpus-per-node=H100:1 # will run on Zen 4
For CPU jobs, you can explicitly select Zen4 or Icelake architecture using
or
If you do not specify any constraint, it will automatically be ZEN4.
Use jobinfo
to see current queue and to see if some node types are less congested.
Compiling software¶
All the modules we provide are built and optimized for each architecture (Zen4 and Icelake). If you wish to build your own software optimized for Zen4, GCC and Clang both offer good performance. In addition, you can use AMD's fork of Clang, AOCC. Clang and AOCC can also offer a better debugging experience than GCC. The intel compilers are not advised.
You should load the module 'buildenv' in order to compile your software, along with your preferred compiler, e.g:
or
Note that AOCC also uses the name 'clang' for its compiler executable.
Also, be aware that by default FlexiBLAS will use the BLIS
backend on Zen4,
as this generally offer superior performance to that of OpenBLAS or
MKL. If you wish to use other backends, you can use the command
to see the name of the library files for each backend. Then load the backend that you wish to use, and run your software with
This will switch the backend with no need to recompile.
Building for Icelake¶
As Icelake and Zen4 CPUs differ, software optimized with -march=native
on one type will not work on the other.
Zen4 software can be compiled on the login nodes, but for Icelake optimized software you can
- Use the Vera portal https://vera.c3se.chalmers.se and get a desktop session on a Icelake node.
- Run an interactive job against a Icelake compute node
srun -A YOUR_PROJECT -C ICELAKE -n 16 --pty bash
and compile your software there.
In general the intel
toolchain and MKL
offers the best performance on Icelake, e.g. ml buildenv/default-intel-2024a