ᛆᛚᚡᛁᛋ (Alvis)¶

Alvis logo

The Alvis cluster is a national NAISS resource dedicated for Artificial Intelligence and Machine Learning research. The system is built around Graphical Processing Units (GPUs) accelerator cards, and consists of several types of compute nodes with multiple NVIDIA GPUs. To get access to Alvis see First-time users.

For more information on using Alvis, see documentation on this site, in particular the parts on Machine Learning, Data sets, Containers and HPC and AI software.

Alvis is also available from an Open OnDemand web portal at

https://alvis.c3se.chalmers.se. For more information see the Alvis OnDemand documentation.

Etymology; Alvis is an old Nordic name meaning "all-wise" written as ᛆᛚᚡᛁᛋ in medieval viking runes.

Queue¶

Below shows the current availability of resources in the queue on the login node (link):

Queue information is only accessible from within SUNET networks (use of VPN is necessary if you are outside).

Hardware¶

Overview¶

The main alvis partition has:

#nodes	CPU	#cores	RAM (GB)	TMPDIR (GB)	GPUS	Note
12	Skylake	16	768	407	2xV100
5	Skylake	32	768	407	4xV100
19	Skylake	32	576	407	8xT4
1	Skylake	32	1536	407	8xT4
1	Skylake	32	768	1660	NOGPU
1	Icelake	64	512	852	NOGPU
83	Icelake	64	256	837	4xA40	No IB
54	Icelake	64	256	166	4xA100	Fast Mimer
20	Icelake	64	512	166	4xA100	Fast Mimer
8	Icelake	64	1024	166	4xA100fat	Fast Mimer

A100fat are A100 gpus with 80GB VRAM.
A100 nodes have small $TMPDIR is compensated by the 100GBit InfiniBand connection to Mimer.

Login node alvis1.c3se.chalmers.se:
- 4 x NVIDIA Tesla T4 GPU with 16GB RAM
- 2 x 16 core Intel(R) Xeon(R) Gold 6226R (Skylake) CPU @ 2.90GHz (total 32 cores)
- 768GB DDR4 RAM
- 2x100 GbitE internet connection via SUNET
- 1x100 Gbit InfiniBand to Mimer
Login/data transfer node alvis2.c3se.chalmers.se:
- No GPUs
- 2 x Intel(R) Xeon(R) Gold 6338 CPU (Icelake) @ 2.00GHz (total 64 cores)
- 256GB DDR4 RAM
- 2x100 GbitE internet connection via SUNET
- 2x100 Gbit InfiniBand to Mimer

Nvidia A100 based nodes¶

56 nodes optimised for training jobs

4 x NVIDIA Tesla A100 HGX GPU with 40GB RAM
2 x 32 core Intel(R) Xeon(R) Gold 6338 CPU @ 2GHz (total 64 cores)
256GiB DDR4 RAM

20 nodes optimised for training jobs with a bit more memory needs

4 x NVIDIA Tesla A100 HGX GPU with 40GB RAM
2 x 32 core Intel(R) Xeon(R) Gold 6338 CPU @ 2GHz (total 64 cores)
512GiB DDR4 RAM

8 nodes optimised for heavy training jobs

4 x NVIDIA Tesla A100 HGX GPU with 80GB RAM
2 x 32 core Intel(R) Xeon(R) Gold 6338 CPU @ 2GHz (total 64 cores)
1024GiB DDR4 RAM

Nvidia A40 based nodes¶

85 nodes optimised for inference and smaller training jobs

4 x NVIDIA Tesla A40 GPU with 48GB RAM
2 x 32 core Intel(R) Xeon(R) Gold 6338 CPU @ 2GHz (total 64 cores)
256GiB DDR4 RAM

Nvidia V100 based nodes¶

12 performance GPU compute nodes alvis1-01 to alvis1-12 with the node configuration

2 x NVIDIA Tesla V100 SXM2 GPU with 32GB RAM, connected by nvlink
2 x 8 core Intel(R) Xeon(R) Gold 6244 CPU @ 3.60GHz (total 16 cores)
768GB DDR4 RAM
387GB SSD scratch disk

5 performance GPU compute nodes alvis1-13 to alvis-17 with the node configuration

4 x NVIDIA Tesla V100 SXM2 GPU with 32GB RAM, connected by nvlink
2 x 16 core Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz (total 32 cores)
768GB DDR4 RAM
387GB SSD scratch disk

Nvidia T4 based nodes¶

20 capacity GPU compute nodes alvis2-01 to alvis2-20 with the node configuration

8 x NVIDIA Tesla T4 GPU with 16GB RAM
2 x 16 core Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz (total 32 cores)
576GB DDR4 RAM (1 node with 1536GB)
387GB SSD scratch disk

CPU-only nodes¶

1 compute node without GPUs alvis-cpu1 with the node configuration

2 x 16 core Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz (total 32 cores)
768GB DDR4 RAM
3.4TB SSD scratch disk

4 nodes without GPUs alvis10-11 to alvis10-14 with the node configuration

2 x 32 core Intel(R) Xeon(R) Gold 8358 CPU @ 2.6GHz (total 64 cores)
512GiB DDR4 RAM

Note that this type of node is suitable for the more heavy duty pre- and postprocessing steps that do not require a GPU. To use these nodes specify the constraint -C NOGPU to SLURM.

Dedicated storage¶

In addition to the compute nodes listed above, a fast 740TB dedicate all-flash storage solution is available in Alvis. The solution is backed by ~7PB of bulk storage as a second tier.

More details on the storage solution can be found on this page.

GPU cost on Alvis¶

Depending on which GPU type you choose for your job, an hour of on the GPU will have different costs according to the following table:

Type	VRAM	System memory per GPU	CPU cores per GPU	Cost
T4	16 GB	72 or 192 GB	4	0.35
A40	48 GB	64 GB	16	1
V100	32 GB	192 or 384 GB	8	1.31
A100	40 GB	64 or 128 GB	16	1.84
A100fat	80 GB	256 GB	16	2.2
NOGPU	N/A	N/A	N/A	0.05

Example: using 2xT4 GPUs for 10 hours costs 7 "GPU hours" (2 x 0.35 x 10).
Using the constraints: 2xV100, MEM1536, MEM512 will give more memory on T4, V100, and A100 nodes respectively.
The cost reflects the actual price of the hardware (normalised against an A40 node/GPU).
The cost for the NOGPU nodes is per core-hour.

GPU hardware details¶

Each GPU comes in a different set-up hand has different specs.

#GPUs	GPUs	Capability	CPU	Note
44	V100	7.0	Skylake
160	T4	7.5	Skylake
332	A40	8.6	Icelake	No IB
296	A100	8.0	Icelake	Fast Mimer
32	A100fat	8.0	Icelake	Fast Mimer

Expanding a bit on the table, A40s and A100s are the most numerous. A40 machines are not connceted with InfiniBand and as such are not meant for multi-node jobs. A100 nodes have an InfiniBand connection to Mimer, which typically means better read/write performance.

Data type	A100	A40	V100	T4
FP64	9.7 \| 19.5*	0.58	7.8	0.25
FP32	19.5	37.4	15.7	8.1
TF32	156**	74.8**	N/A	N/A
FP16	312**	149.7**	125	65
BF16	312**	149.7**	N/A	N/A
Int8	624**	299.3**	64	130
Int4	1248**	598.7**	N/A	260

*Performance on Tensor Core.
**Up to a factor of two faster with sparsity.

All numbers in the above table is reported for peak rates in units of trillions of operations per second (i.e. Tflop/s for float types). Note that the performance between different

Shortly about the different datatypes:

FP64 or doubles are not much used in machine learning.
FP32 or singles are the typical precision used for input data, note that mixed precision or in the case of Ampere also tensor floats can give a significant speed-up.
TF32 or TensorFloats have the range of singles but precision of halfs (thus only 19 bit instead of 32). For machine learning workloads the loss in precision is usually worth the speed-up. This data type is not available for V100 and T4 GPUs.
FP16 or halfs are used in mixed precision. For calculations where this is sufficient it can give a big speed-up, especially for V100s and T4s.
BF16 or brain floats are similar to halfs but three bits have been added to the range (exponent) from the precision (mantissa). Not available for V100 and T4 GPUs.
Int8/Int4 can be used in quantization to reduce memory requirements and additionally gives some speed-up but with much lower precision. Can impact model performance and is typically only considered for inference workloads.

More info¶

To get started look through the introduction slides for Alvis, and the general user documentation on this site.