PyTorch

PyTorch is a popular machine learning (ML) framework, see https://pytorch.org/.

A common use case is to import PyTorch as a module in Python. It is then up to you as a user to write your particular ML application as a Python script using the torch Python module functionality.

At C3SE we provide precompiled optimised installations of both legacy and recent versions of PyTorch in our tree of software modules, see our introduction to software modules. It is also possible to run PyTorch using containers. For generic guid on using containers, see https://github.com/c3se/containers/blob/master/README.md and https://www.c3se.chalmers.se/documentation/applications/containers/.

In the software module tree we provide PyTorch versions both with CUDA GPU acceleration and versions using only the CPU. Which one you want to use depends on which part of our clusters you will run your jobs on. However, PyTorch is heavily optimised for GPU hardware so we recommend using the CUDA version and to run it on the compute nodes equipped with GPUs. How to do this is described in our guide to running jobs.

To list the available versions you can use the module spider pytorch command:

[cid@vera1 ~]$ module spider pytorch
...
     Versions:
        PyTorch/1.1.0-Python-3.7.2
        PyTorch/1.2.0-Python-3.7.2
        PyTorch/1.3.1-Python-3.7.4
        PyTorch/1.4.0-Python-3.7.4

To use the version PyTorch/1.4.0-Python-3.7.4 (i.e. PyTorch v1.4.0 with Python 3.7 bindings) we inspect that particular module with the module spider command:

[ohmanm@vera1 ~]$ module spider PyTorch/1.4.0-Python-3.7.4

-------------------------------------------------------------------------------------------------
  PyTorch: PyTorch/1.4.0-Python-3.7.4
-------------------------------------------------------------------------------------------------
    Description:
      Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch is a
      deep learning framework that puts Python first.


    You will need to load all module(s) on any one of the lines below before the "PyTorch/1.4.0-Pytho
n-3.7.4" module is available to load.

      GCC/8.3.0  CUDA/10.1.243  OpenMPI/3.1.4
      GCC/8.3.0  OpenMPI/3.1.4
...

Here we see that the PyTorch module depends on a number of other software modules. All these modules have to be loaded before loading the PyTorch/1.4.0-Python-3.7.4 module.

If you want to run on CUDA accelerated GPU hardware, make sure to select the set of modules including the CUDA/10.1.243 package.

[cid@vera1 ~]$ module load GCC/8.3.0 CUDA/10.1.243 OpenMPI/3.1.4 PyTorch/1.4.0-Python-3.7.4

After loading the PyTorch/1.4.0-Python-3.7.4 module your environment is now configured to start calling PyTorch from Python. Here is a small test that prints the PyTorch version available in your environment:

[cid@vera1 ~]$ python -c "import torch; print(torch.__version__)"
1.4.0

If you intend to run your calculations on GPU hardware it can be useful to check that PyTorch detects the GPU hardware using the torch.cuda submodule. Here is an example from a node equipped with a Nvidia Quadro GPU.

[cid@vera1 ~]$ python -c "import torch; print('CUDA enabled:', torch.cuda.is_available())"
CUDA enabled: True
[cid@vera1 ~]$ python -c "import torch.cuda as tc; id = tc.current_device(); print('Device:', tc.get_device_name(id))"
Device: Quadro P2000

Dataloading

Machine learning datasets commonly can be made up from a multitude of files. In HPC environments these can be less than ideal. Reading (and writing) is generally a lot slower compared when done to multiple files compared to a single large files.

Luckily, in PyTorch 1.11 and newer, loading data directly from archives was implemented in connection with TorchData. To use this load the relevant module

[cid@alvis1 ~]$ module spider torchdata

------------------------------------------------------------------------------------------------------------
  torchdata:
------------------------------------------------------------------------------------------------------------
     Versions:
        torchdata/0.3.0-foss-2021a-PyTorch-1.11.0-CUDA-11.3.1
        torchdata/0.3.0 (E)

Names marked by a trailing (E) are extensions provided by another module.


------------------------------------------------------------------------------------------------------------
  For detailed information about a specific "torchdata" package (including how to load the modules) use the modu
le's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.
  For example:

     $ module spider torchdata/0.3.0
------------------------------------------------------------------------------------------------------------


[cid@alvis1 ~]$ module load torchdata/0.3.0-foss-2021a-PyTorch-1.11.0-CUDA-11.3.1

Loading from archives is part of a framwork called datapipes which are utilities to build up a datapipeline that can then be used in a dataloader. Below you can find one example that also uses the torchvision module to load an image dataset.

Note that compressed tar tiles (e.g. tar.gz) are extremely slow to read from when shuffle is True. You should either uncompress the archive (but don't unpack!) or use zip files instead. Other useful filetypes are HDF5 and NetCDF.

from collections.abc import Tuple

from PIL import Image
from torch.utils.data import DataLoader, backward_compatability
from torchvision.transforms import ToTensor
from torchdata.datapipes.iter import FileLister

# Set-up type hints for readability
FileName = str
ByteStream = bytes
RawDataPoint = Tuple[FileName, ByteStream]

# Select that archive to load data from
datapipe = FileLister('/mimer/NOBACKUP/Datasets/Lyft-level5/Perception', 'train.tar')

# Load from the selected archives directly
datapipe = datapipe.load_from_tar()

# Filter to only stream jpeg files
def jpeg_filter(raw_data_point: RawDataPoint) -> bool:
    filename = raw_data_point[0]
    return filename.endswith('.jpeg')

datapipe = datapipe.filter(jpeg_filter)

# Enable shuffle option in DataLoader at this point in pipeline
datapipe = datapipe.shuffle()

# Enable loading data with multiple workers in DataLoader
datapipe = datapipe.sharding_filter()

# Parse ByteStream from jpeg to tensor
to_tensor = ToTensor()
def parse_jpeg(raw_data_point: RawDataPoint) -> torch.Tensor:
    filename, byte_stream = raw_data_point
    return to_tensor(Image.open(byte_stream))

datapipe = datapipe.map(parse_jpeg)

# This object can then be used as a regular iterator (useful for debugging)...
for image_tensor in datapipe:
    break

# ... or more commonly in dataloaders
dataloader = DataLoader(
    dataset = datapipe,
    shuffle = True,
    num_workers = 2,
    # worker_init_fn only needed with multiple workers
    worker_init_fn = backward_compatability.worker_init_fn,
)