As of the expansion happening under 2022, the Vera cluster contains several hardware models. It runs Intel Xeon Gold 6130 (code-named "Skylake") CPU's and newer Intel(R) Xeon(R) Gold 6338 CPU and Platinum 8358 (code-named "Icelake") CPUs. All nodes have dual CPU sockets. It has T4, A40, V100, A100 NVidia GPUs and Infiniband network.
vera partition has:
|#nodes||CPU||#cores||RAM (GB)||TMPDIR (GB)||GPUS|
Login nodes are Skylake machines with 192GB of RAM and are equipped with NVIDIA P2000 for remote graphics.
Several local research groups have also purchased private partitions with additional nodes and GPUs. You can specific node information from slurm with:
sinfo -N -p vera -o %n,%m,%G,%b
The Skylake systems has a 25Gigabit Ethernet network used for logins has a 56 Gbps Infiniband high-speed/low-latency network for parallel computations and filesystem access. The servers are build by Supermicro and the compute node hardware by Intel, the system is delivered by Southpole. There are also 3 system servers used for accessing and managing the cluster.
The Icelake expansion has 25G Ethernet network for filesystem access and 100 Gbps Infiniband high-speed/low-latency network for parallel computations.
GPU cost on Vera¶
Jobs "cost" based on the number of physical cores they allocate, plus
- Example: A job using a full node with a single T4 for 10 hours:
(32 + 6) * 10 = 380core hours
- Note: 16, 32, and 64 bit floating point performance differ greatly between these specialized GPUs. Pick the one most efficient for your application.
- Additional running cost is based on the price compared to a CPU node.
- You don't pay any extra for selecting a node with more memory; but you are typically competing for less available hardware.
Cores, threads and CPU:s¶
NOTE: Starting with the expansion and OS upgrade of Vera, Hyper-threading will be disabled. You should adjust your scripts for MPI applications be removing any
-c 2 flag when running on the updated nodes. See news page for details.
One thing to note that is different from the previous systems at C3SE is that Hyper-Threading (HT for short) is enabled on Vera nodes.
Each Vera node have 2 physical processors, with 16 (physical) cores each (giving a total of 32 cores per node). With HT enabled (giving 2 threads per core, and a total of 64 threads per node) the following must be taken into consideration:
- If your code is heavily optimised for the Vera hardware, you probably will not benefit from HT and should only use 1 task per core. To use this add "-c 2" (or "--cpus-per-task=2") to your jobscript, or the commandline.
- You will probably want to benchmark using "-n X", "-n X -c 2" and "-n 2X", where X is the number of MPI-processes that will be launched.
- mpirun automatically picks up the relevant information from Slurm, so you probably only want "mpirun ./my.exe" in your jobscript (i.e. no "-n" or "-np" flags).
- Slurm will only allocate you full core, i.e. you will only get even number of tasks if you do not use "-c 2"
- Specifying only "-n1" will actually give you 2 tasks/threads to use (one physical core).
- In $TMPDIR you will find task-files in MPICH and LAM format:
- with all tasks: $TMPDIR/mpichnodes, $TMPDIR/lamnodes
- with physical cores only: $TMPDIR/mpichnodes.no_HT, $TMPDIR/lamnodes.no_HT
For general information on running jobs, see running jobs
If you need some kind of support (trouble logging in, how to run your software, etc.) please first
- Contact the PI of your project and see if he/she can help
- Talk with your fellow students/colleagues
- Contact C3SE support