HuggingFace Datasets¶
HuggingFace provides a Python package as well as a repository for many machine learning datasets. However, there are some common issues with using it on Alvis.
- By default the home directory is used to store the processed data.
Recommended use¶
- Point
HF_HOME
to your project storage to not fill up your home directory. - The first time you do
load_dataset
and the dataset is downloaded, do this on alvis2 the dedicated data transfer node. - To use the centrally provided datasets, you can call
load_dataset
with the absolute path to the downloaded snapshots. For example, you can load the downloaded imagenet dataset (if you have join the group) with