Python

We install many versions of Python. The normal Python module also includes many common libraries, such as matplotlib, and ipython. NumPy and SciPy are provided as a separate module, see SciPy bundle.

The software we build for the cluster is optimized for the hardware. Pre-compiled versions are often only built generically, which will give up a lot of performance. Try to use the modules provided if possible.

Virtual environments

The virtualenv command is included in the Python modules. Load your favourite version of Python (and everything else you need, e.g. SciPy-bundle) from the module system first. The first time, we create a new virtual environment (only done once), e.g.

flat_modules
module load SciPy-bundle/2021.05-foss-2021a matplotlib/3.4.2-foss-2021a h5py/3.2.1-foss-2021a
virtualenv --system-site-packages my_python

to use this environment, we must activate it and load the modules (every time you log in)

flat_modules
module load SciPy-bundle/2021.05-foss-2021a matplotlib/3.4.2-foss-2021a h5py/3.2.1-foss-2021a
source my_python/bin/activate

and then we can install modules locally

pip install --no-cache-dir --no-build-isolation some_module

The --no-cache-dir option is required to avoid it from reusing earlier installations from the same user in a different environment. The --no-build-isolation is to make sure that it uses the loaded modules from the module system when building any Cython libraries.

For more information see 1

Accessing virtual environments in Jupyter Notebook

If you're using virtual-environments in connection with Jupyter Notebook you might have problems that the kernel used in Jupyter doesn't recognize the correct site-packages. To resolve this do the following after completing the above steps

flat_modules
module load SciPy-bundle/2021.05-foss-2021a matplotlib/3.4.2-foss-2021a h5py/3.2.1-foss-2021a
source my_python/bin/activate
python -m ipykernel install --user --name=my_python --display-name="My Python"

and then when running your notebooks you should be able to select this as your kernel

Changing kernel is done under Kernel>Change kernel>My Python

Installing local Python modules

We recommend the virtualenv method above. With the pip command, you can install additional python libraries yourself. You must first choose a existing python module and load it, e.g:

flat_modules
module load SciPy-bundle/2021.05-foss-2021a matplotlib/3.4.2-foss-2021a h5py/3.2.1-foss-2021a

It is a good idea to try to use as many python packages from the module tree as possible, since

  1. You will have to install fewer additional packages.
  2. They are sometimes much faster then the generic packages you get from pip install (NumPy can be up to 10x faster).

If you need an additional package, you can then install it via

pip install some_library --prefix <your_python_path>

which will skip all the dependencies you already satisfy via the loaded modules.

You will need add the site-packages path to $PYTHONPATH, e.g:

export PYTHONPATH=$PYTHONPATH:<your_python_path>/lib/python3.7/site-packages/

Similarly, you may have to add

export PATH=$PATH:<your_python_path>/bin

to your $PATH and for some libraries, also extend $LD_LIBRARY_PATH to where various shared libraries may end up.

When you are done, you can put something like this in your job-script:

flat_modules
module load SciPy-bundle/2021.05-foss-2021a matplotlib/3.4.2-foss-2021a h5py/3.2.1-foss-2021a
export PYTHONPATH=$PYTHONPATH:<your_python_path>/lib/python3.7/site-packages/

python ./my_script.py

Conda environments

NOTE The best way to create conda environments is with Singularity. There are example recipes at

/apps/containers/Conda/conda-example.recipe
/apps/containers/Conda/conda-example2.recipe
/apps/containers/Conda/conda-example3.recipe

which after you've built them can be used as e.g.

singularity exec /apps/Singularity/conda-example.sif python my_script.py

This drastically reduces the installation size, and the number of files (down to 1), making it drastically better for the centre storage. Many conda packages aren't build compatible with CentOS6 (or even 7). In these cases, you must use singularity, or build all of them manually.

To add additional software to existing containers you can use overlays. Please see our Singularity page for more details.

If you really can't use containers

If you really can't create a container, you can create conda environments (these become large! Don't try to create them in your home directory), e.g.

module load Anaconda3
cd <somewhere_with_a_lot_of_space>
conda create -p my_conda python=3

Activate the environment when you wish to use it:

source activate <somewhere_with_a_lot_of_space>/my_conda

(you might want to create an alias, or do this in your bashrc, as you will need to do this every time you run the program).

Install what you need, you can also add other channels, e.g:

conda config --add channels acellera
conda install htmd

conda will cache files in ~/.conda/pkgs/ which are necessary. These extra copies fill up your disk quota, and you can clear them out with the command:

conda clean --all

Matplotlib

Please note that when running Matplotlib, you might want to run

matplotlib.use('Agg')

after importing, to avoid Matplotlib trying to use the X Windows backend, which will fail if you didn't log in with X forwarding (which won't work in batch jobs).