Skip to content

HuggingFace Datasets

HuggingFace provides a Python package as well as a repository for many machine learning datasets. However, there are some common issues with using it on Alvis.

  • By default the home directory is used to store the processed data.
  1. Point HF_HOME to your project storage to not fill up your home directory.
  2. The first time you do load_dataset and the dataset is downloaded, do this on alvis2 the dedicated data transfer node.
  3. To use the centrally provided datasets, you can call load_dataset with the absolute path to the downloaded snapshots. For example, you can load the downloaded imagenet dataset (if you have join the group) with
    datasets.load_dataset("/mimer/NOBACKUP/Datasets/ImageNet/hf-cache/imagenet-1k/default/1.0.0/09dbb3153f1ac686bac1f40d24f307c383b383bc171f2df5d9e91c1ad57455b9/")