PyTorch🔗

PyTorch is a popular machine learning (ML) framework.

A common use case is to import PyTorch as a module in Python. It is then up to you as a user to write your particular ML application as a Python script using the torch Python module functionality.

We provide precompiled optimised installations of both legacy and recent versions of PyTorch in our tree of software modules, see our introduction to software modules. Just like with most software, search for all available versions with module spider pytorch. If you want to run on CUDA accelerated GPU hardware, make sure to select a version with CUDA. It is also possible to run PyTorch using containers of which we provide many versions already centrally installed.

PyTorch is heavily optimised for GPU hardware so we recommend using the CUDA version and to run it on the compute nodes equipped with GPUs. How to do this is described in our guide to running jobs.

Quick guide🔗

Use module spider PyTorch-bundle to find latest modules of PyTorch, torchvision etc.
- Do you need a newer PyTorch version?
Use GPUs primarily.
- Apply to Alvis to get access to GPUs.
- See PyTorch documentation and/or Alvis intro tutorial for using a GPU.
- Check that you are using a GPU and monitor your GPU usage.
- Use profiling to make good use of GPUs.
- For multi-GPU usage, check-out specifics for Alvis.
Use the right precision for your use case.
If GPU utilisation goes down between batches, look at your dataloading pipeline.
If you are running long jobs, use checkpointing

Checking for available GPUs🔗

After loading the PyTorch module of your choice, your environment is now configured to start using PyTorch in Python. Here is a small test that prints the PyTorch version available in your environment:

[cid@vera1 ~]$ python -c "import torch; print(torch.__version__)"
1.11.0

If you intend to run your calculations on GPU hardware it can be useful to check that PyTorch detects the GPU hardware using the torch.cuda submodule. Here is an example from a node equipped with a Nvidia Quadro GPU.

[cid@vera1 ~]$ python -c "import torch; print('CUDA enabled:', torch.cuda.is_available())"
CUDA enabled: True
[cid@vera1 ~]$ python -c "import torch.cuda as tc; id = tc.current_device(); print('Device:', tc.get_device_name(id))"
Device: Quadro P2000

To use GPUs checkout the official PyTorch documentation and/or the Alvis intro tutorial.

PyTorch-bundle🔗

PyTorch-bundle is a module which bundles PyTorch, PyTorch-Ignite, torchvision, torch_tb_profiler, torchtext and torchdata. Other less used PyTorch projects like torchaudio can be added on request.

Use e.g. module spider PyTorch-bundle/1.13.1-foss-2022a-CUDA-11.7.0 to see what is included in a particular version of PyTorch-bundle.

Performance and precision🔗

Which GPU you're using and which data type is used in computations can have a huge impact on performance at max utilisation (see GPU hardware details).

The main performance gain in using Ampere GPUs and newer (A40s and A100s in our case) comes from using the tensor cores. We recommend all PyTorch users to check out the following links:

Dataloading🔗

Machine learning datasets commonly can be made up from a multitude of files. In HPC environments these can be less than ideal. Reading (and writing) is generally a lot slower compared when done to multiple files compared to a single large files. You can find our general tips at datasets.

Luckily, in PyTorch 1.11 and newer, loading data directly from archives was implemented in connection with TorchData. To use this load the relevant module, but as usual we first have to locate it.

[vikren@alvis2 ~]$ module spider torchdata

----------------------------------------------------------------------------
  torchdata:
----------------------------------------------------------------------------
     Versions:
        torchdata/0.3.0-foss-2021a-PyTorch-1.11.0-CUDA-11.3.1
        torchdata/0.3.0 (E)
        torchdata/0.4.0 (E)
        torchdata/0.4.1 (E)
        torchdata/0.5.1 (E)

Names marked by a trailing (E) are extensions provided by another module.

----------------------------------------------------------------------------
  For detailed information about a specific "torchdata" package (including how t
o load the modules) use the module's full name.
  Note that names that have a trailing (E) are extensions provided by other modu
les.
  For example:

     $ module spider torchdata/0.5.1
----------------------------------------------------------------------------


[vikren@alvis2 ~]$ module spider torchdata/0.5.1

----------------------------------------------------------------------------
  torchdata: torchdata/0.5.1 (E)
----------------------------------------------------------------------------
    This extension is provided by the following modules. To access the extension
 you must load one of the following modules. Note that any module names in paren
theses show the module location in the software hierarchy.

       PyTorch-bundle/1.13.1-foss-2022a-CUDA-11.7.0

Names marked by a trailing (E) are extensions provided by another module.


[vikren@alvis2 ~]$ module load PyTorch-bundle/1.13.1-foss-2022a-CUDA-11.7.0

Loading from archives is part of a framwork called datapipes which are utilities to build up a datapipeline that can then be used in a dataloader. Below you can find one example that also uses the torchvision module to load an image dataset.

Note that compressed tar tiles (e.g. tar.gz) are extremely slow to read from when shuffle is True. You should either uncompress the archive (but don't unpack!) or use zip files instead. Other useful filetypes are HDF5 and NetCDF.

from collections.abc import Tuple

from PIL import Image
from torch.utils.data import DataLoader, backward_compatability
from torchvision.transforms import ToTensor
from torchdata.datapipes.iter import FileLister

# Set-up type hints for readability
FileName = str
ByteStream = bytes
RawDataPoint = Tuple[FileName, ByteStream]

# Select that archive to load data from
datapipe = FileLister('/mimer/NOBACKUP/Datasets/Lyft-level5/Perception', 'train.tar')

# Load from the selected archives directly
datapipe = datapipe.load_from_tar()

# Filter to only stream jpeg files
def jpeg_filter(raw_data_point: RawDataPoint) -> bool:
    filename = raw_data_point[0]
    return filename.endswith('.jpeg')

datapipe = datapipe.filter(jpeg_filter)

# Enable shuffle option in DataLoader at this point in pipeline
datapipe = datapipe.shuffle()

# Enable loading data with multiple workers in DataLoader
datapipe = datapipe.sharding_filter()

# Parse ByteStream from jpeg to tensor
to_tensor = ToTensor()
def parse_jpeg(raw_data_point: RawDataPoint) -> torch.Tensor:
    filename, byte_stream = raw_data_point
    return to_tensor(Image.open(byte_stream))

datapipe = datapipe.map(parse_jpeg)

# This object can then be used as a regular iterator (useful for debugging)...
for image_tensor in datapipe:
    break

# ... or more commonly in dataloaders
dataloader = DataLoader(
    dataset = datapipe,
    shuffle = True,
    num_workers = 2,
    # worker_init_fn only needed for oldest torchdata 0.3.0 or older
    worker_init_fn = backward_compatability.worker_init_fn,
)

Using multiple GPUs🔗

This section will basically assume that you are using Alvis. In case you are a Vera user that is heavily reliant on GPUs for your machine learning with PyTorch, then you should apply for resources on Alvis where more GPU resourcesare available.

The two typical ways to scale to using multiple GPUs is: - Data Parallelism - Model Parallelism

Most multi-GPU jobs will benefit from GPU direct and Infiniband accross nodes. For multi node jobs check at least that you are: 1. On a node with InfiniBand. 2. The datatransfer between nodes is making use of InfiniBand, by e.g. using job_stats.py <JOBID> and checking the network graphs.

Data parallelism🔗

For DistributedDataParallel with or without torchrun we initialize relevant environment variables:

MASTER_ADDR
MASTER_PORT

for each job.

With data parallelism you will have your model broadcast to all GPUs and then have separete batches on the different GPUs calculate the weight updates in parallel and then summarise into an update as if you had had a single large batch. This is useful if you have a large dataset and want to have larger batches than fit on the GPUs memory.

You can find examples on Data Parallelism in our Alvis tutorial.

Model parallelism🔗

Model parallelism is about storing parts of the model on different GPUs. This is used if your model is too large to fit on a single GPU, for the GPUs available on Alvis this should rarely be a problem but in some rare cases you might reach this limit. Remember that you can see your resource usage for a job with the command job_stats.py.

FAQ🔗

Can you install a newer PyTorch version?🔗

We're working with the EasyBuild community in preparing new PyTorch versions. You can send a support question in case you want to be kept up to date when we add a new version.

An alternative is to use the containers at /apps/containers/PyTorch. We happily build later versions of these on request.

We typically recommend NGC containers over the plain PyTorch ones. For the meaning of the names see What does NGC mean in the containers?.

We do not recommend installing your own version unless necessary. If you install your own version it is up to you to make sure that the installation has the capabilities that you need:

Is the software built for the CUDA compute capabilities corresponding to available GPU types?
If you're making use of CPU computations, is the software built for AVX512?
If you're doing multi GPU jobs, is the software built for GPUDirect and Infiniband?
...

What does NGC mean in the containers?🔗

Containers with NGC in the name are from Nvidia's NGC catalog. Other containers at /apps/containers/PyTorch/ are official PyTorch containers from dockerhub. Provided containers are not as patched and verified as the provided modules, but should work well for most cases. They even have support for communication over Infiniband when using NCCL for multinode communication.