Storage resources at C3SE

The storage hierarchy

Storage is available for different usage and different availability.

Looking at it from the bottom up, we have:

  • The node local disk:
    • available only to the running job
    • automatically purged after the job finishes
    • accessible to the job using $TMPDIR (the environment variable $TMPDIR is automatically set to contain the correct directory path)
    • the available size is different for different clusters (and can even differ between node types in a cluster)
  • Cluster(wide) storage:
    • available from all machines in a specific cluster
    • not available at any C3SE-clusters today (see Centre storage below)
  • Centre(wide) storage:
    • available from all resources at the centre
    • your cluster home directory is located here
  • National accessible storage:
    • requires a separate storage allocation through SNIC
    • file based (as opposed to block based for those above. Cf. a FTP-server)
    • available through using dedicated tools

C3SE Centre storage

The centre storage is available on the resources at the centre. It contains two parts, one with backup available at $SNIC_BACKUP and one without backup available at $SNIC_NOBACKUP.

Users home directories (placed in $SNIC_BACKUP) are:

/c3se/users/<username>/Hebbe
/c3se/users/<username>/Glenn

on Hebbe and Glenn respectively.

For example, if your UID/CID is ada:

$SNIC_BACKUP   is /c3se/users/ada
$SNIC_NOBACKUP is /c3se/NOBACKUP/users/ada
$HOME          is /c3se/users/ada/Glenn on Glenn
$HOME          is /c3se/users/ada/Hebbe on Hebbe

Quota

There is a limit to how much disk space and to how many files (number of inodes) each user can have on the home directory file system. There are two important concepts of the quota limits

  • "quota" (also called "soft limit"). Indicates the amount of resources (disk space or number of files) a user is allowed to allocate for an unrestricted time period. It is possible to temporarily exceed the soft limit (4 weeks grace time).
  • "limit" (also called "hard limit"). Indicates a limit to the amount of resources (disk space or number of files) that a user can allocate. It is not possible to go past the hard limit!

Quotas are enforced by suspending jobs on both Hebbe and Glenn until one has gone below the limits again.

The default quotas are:

Storage location Soft space quota Hard space quota Soft files quota Hard files quota
$SNIC_BACKUP 25GiB 50GiB 60 000 120 000
$SNIC_NOBACKUP 200GiB 1024GiB 500 000 2 500 000

Note! Your home-directories on Glenn and Hebbe are on the same file-system and accessible from both systems. They also share the same quota, so please move or access files directly instead of copying!

How to check your usage and quota limits

You can check your quota limits by issuing the command C3SE_quota.

"Can I exceed my quota? What will happen?"

You are permitted to exceed the soft limits for a given "grace period" (4 weeks), which is the amount of time before the soft limit starts to act as a hard limit. If you exceed the grace period or hit the hard limits, you will not be allowed to submit any new jobs, and your queued jobs are put on hold, and the running jobs are suspended. You have to remove files until your current usage goes below the corresponding limit to be able to get work done again.

The system updates the counts every 3h.

"How much grace time do I have left?"

If you have exceeded your quota for either disk space or inodes, C3SE_quota will show how much time you have left of your grace period.

Copying files into and out of the system

Use tools that can communicate using the SSH/SFTP/SCP-protocols to transfer files, for example FileZilla, WinSCP and rsync (or scp/sftp directly!).

If you're on the Chalmers network, i.e. have 129.16.* IP address, you can connect directly to the login node and copy your files.

If you're not on the Chalmers network you must either:

  • connect using the Chalmers VPN
  • copy files through the public SSH machines, ex. remote11.chalmers.se
  • initiate the transfers from within the C3SE clusters.

Removing files

If you have managed to get over your quota, you can find out what directory the large number of files are located using

lfs find <directory> | wc -l

and to find where the large size is spent using the du-command, e.g:

du --max-depth=1 -m <directory>

If you have accidentally created a huge amount of files, you most effectively remove very large trees of files using e.g:

lfs find <directory> -t f --print0 | xargs -n 10 -P 10 -0 rm

If you are unsure, please contact the support.

Using node local disk ($TMPDIR)

It is crucial that you use the node local disk for jobs that perform a lot of intense file IO. The globally accessible file system is a shared resource with limited capability and performance.

It is also crucial that you retrieve and save any important data that was produced and saved to the node local file system. The node local file systems are always wiped clean immediately after your job has ended!

To use use $TMPDIR, copy the files there, change to the directory, run your simulation, and copy the results back:

#!/bin/bash
# ... various SLURM commands
cp file1 file2 $TMPDIR
cd $TMPDIR
... run your code ...
cp results $SLURM_SUBMIT_DIR

Be certain that you retrieve and save any important data that was produced and saved on the node local file system.

The common size of $TMPDIR is 1600GB and 400GB on Hebbe and Glenn respectively. When running on a shared node, you will be allocated size on $TMPDIR proportional to the number of cores you have on the node.

Note! As a default each node have a private $TMPDIR, i.e. $TMPDIR share the same path, but point to different storage areas. You have to make sure to distribute and collect files to all nodes if you use more than one node! Also see below for a shared, parallel $TMPDIR.

Distributing files to multiple $TMPDIR's

Note: Using ptmpdir is usually a much simpler option, see below!

The job-script only executes on the main node (first node) in your job, therefore the job-script must

  1. distributes input files to all other nodes in the job
  2. collect output files from all other nodes
  3. copy the results back to the centre storage

To distribute files to the node local disks, use the command pdcp. When invoked from within a job script, pdcp automatically resolves which nodes are involved. Ex.

pdcp file1 file2 $TMPDIR

copies file1 and file2 from the current directory to the different $TMPDIR on all nodes in the current job.

Collecting the data back from multiple nodes depends on the software used.

cp $TMPDIR/output_file.data $SLURM_SUBMIT_DIR

copies output_file.data from the head node only, whereas

rpdcp $TMPDIR/output.data $SLURM_SUBMIT_DIR

copies the files $TMPDIR/output.data from all compute nodes in the job, and places them in $SLURM_SUBMIT_DIR, ex. output.data.hebbe04-2, output.data.hebbe07-1, etc.

Both the pdcp and rpdcp commands takes the flag -f for recursively copying file hierarchies.

A shared, parallel $TMPDIR

The nodes local disks can be set up to make up a shared, parallel area when running a job on more than 1 node. This will give you:

  • a common namespace (i.e. all the nodes in your job can see the same files)
  • a larger total area aggregating all nodes $TMPDIR
  • a faster file IO

To invoke a shared $TMPDIR, simply add the flag --gres=ptmpdir:1 to your job script.

#!/bin/bash
# ... various SLURM commands
#SBATCH --gres=ptmpdir:1

Your $TMPDIR will now use all active nodes local disks in parallel. Copying files works as if it was one large drive. It is recommended to always use this option if you use $TMPDIR for multi-node jobs!

Saving files periodically

With a little bit of shell scripting, it is possible to periodically save files from $TMPDIR to the centre storage. Please implement this with reason so that you don't put excessive load on the shared file system (if you are unsure, ask support@c3se.chalmers.se for advice).

A hypothetical example that creates a a few backup files once every second hour could look like

#!/bin/bash
# ... various SBATCH flags
while sleep 2h; do
    # This will be executed once every second hour
    rsync -r $TMPDIR/output_data/ $SLURM_SUBMIT_DIR/
done &     # The &-sign after the done-keyword places the while-loop in a sub-shell in the background
LOOPPID=$! # Save the PID of the subshell running the loop
... calculate stuff and retrieve data in a normal fashion ...
# All calculations are done, let's clean up and kill the background loop
kill $LOOPPID

This example would create a background loop that would be running on the head-compute node (the compute node in your allocation that run the batch script).

File sharing with groups and other users

You can share files with other users by manipulating the group ownership and associated permissions of directories or files.

Every computational project has their own group, named "c3-project-name", e.g:

[emilia@hebbe ~]$ groups
emilia c3-gaussian c3-snic2017-1-10

Here emilia is a member of project SNIC2017-1-10.

She wants to share files (read only), and she could do

[emilia@hebbe ~]$ chgroup -r c3-snic2017-1-10 $SNIC_NOBACKUP/shared_data
[emilia@hebbe ~]$ chmod -r g+rx $SNIC_NOBACKUP/shared_data
[emilia@hebbe ~]$ chmod o+x $SNIC_NOBACKUP/

The first two lines change the group, and the group rights recursively (applies to all files under shared_data). The last line gives the necessary execute permissions required to access directories under $SNIC_NOBACKUP.

Remember! If you give out write-permissions in a sub-directory, all files created in there, also by other users, will still count towards your quota.

Access Control Lists (ACLs)

To give more fine-grained control of file sharing, you can use ACLs. This allows you to give out different read, write, and execute permissions to individual users or groups.

If emilia wants to have a shared file storage with robert, and give out read rights to sara, she could do:

[emilia@hebbe ~]$ lfs lsetfacl -R -m user:robert:rwx,user:sara:rx  $SNIC_NOBACKUP/shared_data
[emilia@hebbe ~]$ chmod o+x $SNIC_NOBACKUP/

and she can check the current rights using the corresponding get-command:

[emilia@hebbe ~]$ lfs lgetfacl $SNIC_NOBACKUP/shared_data
# file: /c3se/NOBACKUP/users/emilia/shared_data
# owner: emilia
# group: emilia
user::rwx
user:robert:rwx
user:sara:rx
group::rwx
mask::rwx
other::r-x

You can find many examples using setfacl and getfacl online, e.g. https://linux.die.net/man/1/setfacl but please make use of the lfs lsetfacl and lfs lgetfacl on the shared network file system.

If you need a shared area to read/write files, then applying for a storage project is also an option.