PyTorch🔗
PyTorch is a popular machine learning (ML) framework.
A common use case is to import PyTorch as a module in Python.
It is then up to you as a user to write your particular ML application as a Python script using the torch
Python module functionality.
We provide precompiled optimised installations of both legacy and recent versions of PyTorch in our tree of software modules, see our introduction to software modules.
Just like with most software, search for all available versions with module spider pytorch
.
If you want to run on CUDA accelerated GPU hardware, make sure to select a version with CUDA
.
It is also possible to run PyTorch using containers of which we provide many versions already centrally installed.
PyTorch is heavily optimised for GPU hardware so we recommend using the CUDA version and to run it on the compute nodes equipped with GPUs. How to do this is described in our guide to running jobs.
Quick guide🔗
- Use
module spider PyTorch-bundle
to find latest modules of PyTorch, torchvision etc.- Do you need a newer PyTorch version?
- Use GPUs primarily.
- Apply to Alvis to get access to GPUs.
- See PyTorch documentation and/or Alvis intro tutorial for using a GPU.
- Check that you are using a GPU and monitor your GPU usage.
- Use profiling to make good use of GPUs.
- For multi-GPU usage, check-out specifics for Alvis.
- Use the right precision for your use case.
- If GPU utilisation goes down between batches, look at your dataloading pipeline.
- If you are running long jobs, use checkpointing
Checking for available GPUs🔗
After loading the PyTorch
module of your choice, your environment is now configured to start using PyTorch in Python.
Here is a small test that prints the PyTorch version available in your environment:
[cid@vera1 ~]$ python -c "import torch; print(torch.__version__)"
1.11.0
If you intend to run your calculations on GPU hardware it can be useful to check that PyTorch detects the GPU hardware using the torch.cuda
submodule. Here is an example from a node equipped with a Nvidia Quadro GPU.
[cid@vera1 ~]$ python -c "import torch; print('CUDA enabled:', torch.cuda.is_available())"
CUDA enabled: True
[cid@vera1 ~]$ python -c "import torch.cuda as tc; id = tc.current_device(); print('Device:', tc.get_device_name(id))"
Device: Quadro P2000
To use GPUs checkout the official PyTorch documentation and/or the Alvis intro tutorial.
PyTorch-bundle🔗
PyTorch-bundle is a module which bundles PyTorch, PyTorch-Ignite, torchvision, torch_tb_profiler, torchtext and torchdata. Other less used PyTorch projects like torchaudio
can be added on request.
Use e.g. module spider PyTorch-bundle/1.13.1-foss-2022a-CUDA-11.7.0
to see what is included in a particular version of PyTorch-bundle.
Performance and precision🔗
Which GPU you're using and which data type is used in computations can have a huge impact on performance at max utilisation (see GPU hardware details).
The main performance gain in using Ampere GPUs and newer (A40s and A100s in our case) comes from using the tensor cores. We recommend all PyTorch users to check out the following links:
- What Every User Should Know About Mixed Precision Training in PyTorch
torch.amp
,torch.set_float32_matmul_precision
Dataloading🔗
Machine learning datasets commonly can be made up from a multitude of files. In HPC environments these can be less than ideal. Reading (and writing) is generally a lot slower compared when done to multiple files compared to a single large files. You can find our general tips at datasets.
Luckily, in PyTorch 1.11 and newer, loading data directly from archives was implemented in connection with TorchData. To use this load the relevant module, but as usual we first have to locate it.
[vikren@alvis2 ~]$ module spider torchdata
----------------------------------------------------------------------------
torchdata:
----------------------------------------------------------------------------
Versions:
torchdata/0.3.0-foss-2021a-PyTorch-1.11.0-CUDA-11.3.1
torchdata/0.3.0 (E)
torchdata/0.4.0 (E)
torchdata/0.4.1 (E)
torchdata/0.5.1 (E)
Names marked by a trailing (E) are extensions provided by another module.
----------------------------------------------------------------------------
For detailed information about a specific "torchdata" package (including how t
o load the modules) use the module's full name.
Note that names that have a trailing (E) are extensions provided by other modu
les.
For example:
$ module spider torchdata/0.5.1
----------------------------------------------------------------------------
[vikren@alvis2 ~]$ module spider torchdata/0.5.1
----------------------------------------------------------------------------
torchdata: torchdata/0.5.1 (E)
----------------------------------------------------------------------------
This extension is provided by the following modules. To access the extension
you must load one of the following modules. Note that any module names in paren
theses show the module location in the software hierarchy.
PyTorch-bundle/1.13.1-foss-2022a-CUDA-11.7.0
Names marked by a trailing (E) are extensions provided by another module.
[vikren@alvis2 ~]$ module load PyTorch-bundle/1.13.1-foss-2022a-CUDA-11.7.0
Loading from archives is part of a framwork called datapipes which are
utilities to build up a datapipeline that can then be used in a dataloader.
Below you can find one example that also uses the torchvision
module to load
an image dataset.
Note that compressed tar tiles (e.g. tar.gz) are extremely slow to read from when shuffle is True. You should either uncompress the archive (but don't unpack!) or use zip files instead. Other useful filetypes are HDF5 and NetCDF.
from collections.abc import Tuple
from PIL import Image
from torch.utils.data import DataLoader, backward_compatability
from torchvision.transforms import ToTensor
from torchdata.datapipes.iter import FileLister
# Set-up type hints for readability
FileName = str
ByteStream = bytes
RawDataPoint = Tuple[FileName, ByteStream]
# Select that archive to load data from
datapipe = FileLister('/mimer/NOBACKUP/Datasets/Lyft-level5/Perception', 'train.tar')
# Load from the selected archives directly
datapipe = datapipe.load_from_tar()
# Filter to only stream jpeg files
def jpeg_filter(raw_data_point: RawDataPoint) -> bool:
filename = raw_data_point[0]
return filename.endswith('.jpeg')
datapipe = datapipe.filter(jpeg_filter)
# Enable shuffle option in DataLoader at this point in pipeline
datapipe = datapipe.shuffle()
# Enable loading data with multiple workers in DataLoader
datapipe = datapipe.sharding_filter()
# Parse ByteStream from jpeg to tensor
to_tensor = ToTensor()
def parse_jpeg(raw_data_point: RawDataPoint) -> torch.Tensor:
filename, byte_stream = raw_data_point
return to_tensor(Image.open(byte_stream))
datapipe = datapipe.map(parse_jpeg)
# This object can then be used as a regular iterator (useful for debugging)...
for image_tensor in datapipe:
break
# ... or more commonly in dataloaders
dataloader = DataLoader(
dataset = datapipe,
shuffle = True,
num_workers = 2,
# worker_init_fn only needed for oldest torchdata 0.3.0 or older
worker_init_fn = backward_compatability.worker_init_fn,
)
Using multiple GPUs🔗
This section will basically assume that you are using Alvis. In case you are a Vera user that is heavily reliant on GPUs for your machine learning with PyTorch, then you should apply for resources on Alvis where more GPU resourcesare available.
The two typical ways to scale to using multiple GPUs is: - Data Parallelism - Model Parallelism
Most multi-GPU jobs will benefit from GPU direct and Infiniband accross nodes. For multi node jobs check at least that you are:
1. On a node with InfiniBand.
2. The datatransfer between nodes is making use of InfiniBand, by e.g. using job_stats.py <JOBID>
and checking the network graphs.
Data parallelism🔗
For DistributedDataParallel with or without torchrun
we initialize relevant environment variables:
MASTER_ADDR
MASTER_PORT
for each job.
With data parallelism you will have your model broadcast to all GPUs and then have separete batches on the different GPUs calculate the weight updates in parallel and then summarise into an update as if you had had a single large batch. This is useful if you have a large dataset and want to have larger batches than fit on the GPUs memory.
You can find examples on Data Parallelism in our Alvis tutorial.
Model parallelism🔗
Model parallelism is about storing parts of the model on different GPUs. This is
used if your model is too large to fit on a single GPU, for the GPUs available
on Alvis this should rarely be a problem but in some rare cases you might reach
this limit. Remember that you can see your resource usage for a job with the
command job_stats.py
.
FAQ🔗
Can you install a newer PyTorch version?🔗
We're working with the EasyBuild community in preparing new PyTorch versions. You can send a support question in case you want to be kept up to date when we add a new version.
An alternative is to use the containers
at /apps/containers/PyTorch
. We happily build later versions of these on
request.
We typically recommend NGC containers over the plain PyTorch ones. For the meaning of the names see What does NGC mean in the containers?.
We do not recommend installing your own version unless necessary. If you install your own version it is up to you to make sure that the installation has the capabilities that you need:
- Is the software built for the CUDA compute capabilities corresponding to available GPU types?
- If you're making use of CPU computations, is the software built for AVX512?
- If you're doing multi GPU jobs, is the software built for GPUDirect and Infiniband?
- ...
What does NGC mean in the containers?🔗
Containers with NGC in the name are from Nvidia's NGC
catalog.
Other containers at /apps/containers/PyTorch/
are official PyTorch containers
from dockerhub. Provided containers are not as patched and verified as the
provided modules, but should work well for most cases. They even have support
for communication over Infiniband when using NCCL for multinode communication.