HuggingFace Datasets¶
HuggingFace provides a Python package as well as a repository for many machine learning datasets. However, there are some common issues with using it on Alvis.
- By default the homedirectory is used to store the processed data.
Recommended use¶
- Point
HF_HOME
to your project storage to not fill up your home directory. - The first time you do
load_dataset
and the dataset is downloaded, do this on alvis2 the dedicated data transfer node. - If using the downloaded centrally provided datasets: use the patched
HF-Datasets
from the module system and set the environment variableHF_USE_SOFTFILELOCK=true
.
For this in practice in a particularly tricky case see our documentation on the ImageNet dataset.