PyTorch is a popular machine learning (ML) framework.

A common use case is to import PyTorch as a module in Python. It is then up to you as a user to write your particular ML application as a Python script using the torch Python module functionality.

We provide precompiled optimised installations of both legacy and recent versions of PyTorch in our tree of software modules, see our introduction to software modules. Just like with most software, search for all available versions with module spider pytorch. If you want to run on CUDA accelerated GPU hardware, make sure to select a version with CUDA. It is also possible to run PyTorch using containers of which we provide many versions already centrally installed.

PyTorch is heavily optimised for GPU hardware so we recommend using the CUDA version and to run it on the compute nodes equipped with GPUs. How to do this is described in our guide to running jobs.

After loading the PyTorch module of your choise your environment is now configured to start calling PyTorch from Python. Here is a small test that prints the PyTorch version available in your environment:

[cid@vera1 ~]$ python -c "import torch; print(torch.__version__)"

If you intend to run your calculations on GPU hardware it can be useful to check that PyTorch detects the GPU hardware using the torch.cuda submodule. Here is an example from a node equipped with a Nvidia Quadro GPU.

[cid@vera1 ~]$ python -c "import torch; print('CUDA enabled:', torch.cuda.is_available())"
CUDA enabled: True
[cid@vera1 ~]$ python -c "import torch.cuda as tc; id = tc.current_device(); print('Device:', tc.get_device_name(id))"
Device: Quadro P2000


Machine learning datasets commonly can be made up from a multitude of files. In HPC environments these can be less than ideal. Reading (and writing) is generally a lot slower compared when done to multiple files compared to a single large files.

Luckily, in PyTorch 1.11 and newer, loading data directly from archives was implemented in connection with TorchData. To use this load the relevant module

[cid@alvis1 ~]$ module spider torchdata

        torchdata/0.3.0 (E)

Names marked by a trailing (E) are extensions provided by another module.

  For detailed information about a specific "torchdata" package (including how to load the modules) use the modu
le's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.
  For example:

     $ module spider torchdata/0.3.0

[cid@alvis1 ~]$ module load torchdata/0.3.0-foss-2021a-PyTorch-1.11.0-CUDA-11.3.1

Loading from archives is part of a framwork called datapipes which are utilities to build up a datapipeline that can then be used in a dataloader. Below you can find one example that also uses the torchvision module to load an image dataset.

Note that compressed tar tiles (e.g. tar.gz) are extremely slow to read from when shuffle is True. You should either uncompress the archive (but don't unpack!) or use zip files instead. Other useful filetypes are HDF5 and NetCDF.

from import Tuple

from PIL import Image
from import DataLoader, backward_compatability
from torchvision.transforms import ToTensor
from torchdata.datapipes.iter import FileLister

# Set-up type hints for readability
FileName = str
ByteStream = bytes
RawDataPoint = Tuple[FileName, ByteStream]

# Select that archive to load data from
datapipe = FileLister('/mimer/NOBACKUP/Datasets/Lyft-level5/Perception', 'train.tar')

# Load from the selected archives directly
datapipe = datapipe.load_from_tar()

# Filter to only stream jpeg files
def jpeg_filter(raw_data_point: RawDataPoint) -> bool:
    filename = raw_data_point[0]
    return filename.endswith('.jpeg')

datapipe = datapipe.filter(jpeg_filter)

# Enable shuffle option in DataLoader at this point in pipeline
datapipe = datapipe.shuffle()

# Enable loading data with multiple workers in DataLoader
datapipe = datapipe.sharding_filter()

# Parse ByteStream from jpeg to tensor
to_tensor = ToTensor()
def parse_jpeg(raw_data_point: RawDataPoint) -> torch.Tensor:
    filename, byte_stream = raw_data_point
    return to_tensor(

datapipe =

# This object can then be used as a regular iterator (useful for debugging)...
for image_tensor in datapipe:

# ... or more commonly in dataloaders
dataloader = DataLoader(
    dataset = datapipe,
    shuffle = True,
    num_workers = 2,
    # worker_init_fn only needed with multiple workers
    worker_init_fn = backward_compatability.worker_init_fn,

Using multiple GPUs🔗

This section will basically assume that you are using Alvis. In case you are a Vera user that is heavily reliant on GPUs for your machine learning with PyTorch, then you should apply for resources on Alvis where more GPU resourcesare available.

The two typical ways to scale to using multiple GPUs is: - Data Parallelism - Model Parallelism

With data parallelism you will have your model broadcast to all GPUs and then have separete batches on the different GPUs calculate the weight updates in parallel and then summarise into an update as if you had had a single large batch. This is useful if you have a large dataset and want to have larger batches than fit on the GPUs memory.

Model parallelism is about storing parts of the model on different GPUs. This is used if your model is too large to fit on a single GPU, for the GPUs available on Alvis this should rarely be a problem but in some rare cases you might reach this limit. Remember that you can see your resource usage for a job with the command

You can find examples on Data Parallelism in our Alvis tutorial.