Bulk data transfer to/from Alvis🔗

Any larger transfers to and from Alvis should make sure to use alvis2 log-in node which is the dedicated data transfer node on Alvis.

ADDS - the Alvis Data Downloader Service🔗

The ADDS system is a service offered on Alvis which allows you to submit background tasks to download datasets consisting of a large number of individual files over HTTP/HTTPS to local storage.

If the data is stored on a different kind of storage resource like on Azure, S3, SSH/SFTP to another cluster, etc. then a tool like rclone is more suitable.

Data transfer jobs run in the background on a storage login node. The user-facing interface is a command line tool addsctl.

Datasets🔗

A dataset consists of a set of URLs to individual data files which shall be downloaded into a directory on local storage. The tools on offer accept text files with one URL per line and prepares tasks for the downloader backend to work through.

The `addsctl` tool🔗

For command line access on the I/O login node alvis2 there is an addsctl tool that can report the status of pending and completed tasks, as well as convert file listings for a dataset into a download request. The tool has several subcommands.

`addsctl request DATASET BASEDIR URLFILE`🔗

Schedule a download for a dataset into a given directory on cluster storage which downloads all the files from the URL list file provided on the command line.

The URL list file should contain one plaintext URL per line. Note that as the files are collected in a single directory if multiple links have the same filename the resulting files will collide.

Links are deduplicated and fed to the download backend. Progress information can be queried with the status subcommand or by looking in the book-keeping directories outlined below. Files are stored directly in the dataset directory without any nested directories.

`addsctl status [DATASET]`🔗

Reports approximate status for the download of a specific dataset or for all known datasets.

State directory structure for advanced users🔗

The downloader state is kept in a hidden directory .alvis-data-downloader under the user home directory.

Request information is stored in JSON files that move across the directories as downloading progresses and can be directly inspected if the frontend tools are not sufficient.