We install many versions of Python. The normal Python module also includes many common libraries, such as matplotlib, and ipython. NumPy and SciPy are provided as a separate module, see SciPy bundle.
The software we build for the cluster is optimized for the hardware. Pre-compiled versions are often only built generically, which will give up a lot of performance. Try to use the modules provided if possible.
virtualenv command is included in the Python modules.
Load your favourite version of Python (and everything else you need, e.g.
SciPy-bundle) from the module system first.
The first time, we create a new virtual environment (only done once), e.g.
flat_modules module load SciPy-bundle/2021.05-foss-2021a matplotlib/3.4.2-foss-2021a h5py/3.2.1-foss-2021a virtualenv --system-site-packages my_python
to use this environment, we must activate it and load the modules (every time you log in)
flat_modules module load SciPy-bundle/2021.05-foss-2021a matplotlib/3.4.2-foss-2021a h5py/3.2.1-foss-2021a source my_python/bin/activate
and then we can install modules locally
pip install --no-cache-dir --no-build-isolation some_module
--no-cache-dir option is required to avoid it from reusing earlier
installations from the same user in a different environment. The
--no-build-isolation is to make sure that it uses the loaded modules from the
module system when building any Cython libraries.
For more information see 1
Accessing virtual environments in Jupyter Notebook¶
If you're using virtual-environments in connection with Jupyter Notebook you might have problems that the kernel used in Jupyter doesn't recognize the correct site-packages. To resolve this do the following after completing the above steps
flat_modules module load SciPy-bundle/2021.05-foss-2021a matplotlib/3.4.2-foss-2021a h5py/3.2.1-foss-2021a source my_python/bin/activate python -m ipykernel install --user --name=my_python --display-name="My Python"
and then when running your notebooks you should be able to select this as your kernel
Installing local Python modules¶
We recommend the virtualenv method above.
pip command, you can install additional python libraries yourself.
You must first choose a existing python module and load it, e.g:
flat_modules module load SciPy-bundle/2021.05-foss-2021a matplotlib/3.4.2-foss-2021a h5py/3.2.1-foss-2021a
It is a good idea to try to use as many python packages from the module tree as possible, since
- You will have to install fewer additional packages.
- They are sometimes much faster then the generic packages you get from
pip install(NumPy can be up to 10x faster).
If you need an additional package, you can then install it via
pip install some_library --prefix <your_python_path>
which will skip all the dependencies you already satisfy via the loaded modules.
You will need add the site-packages path to
Similarly, you may have to add
$PATH and for some libraries, also extend
$LD_LIBRARY_PATH to where various shared libraries may end up.
When you are done, you can put something like this in your job-script:
flat_modules module load SciPy-bundle/2021.05-foss-2021a matplotlib/3.4.2-foss-2021a h5py/3.2.1-foss-2021a export PYTHONPATH=$PYTHONPATH:<your_python_path>/lib/python3.7/site-packages/ python ./my_script.py
NOTE The best way to create conda environments is with Singularity. There are example recipes at
/apps/containers/Conda/conda-example.recipe /apps/containers/Conda/conda-example2.recipe /apps/containers/Conda/conda-example3.recipe
which after you've built them can be used as e.g.
singularity exec /apps/Singularity/conda-example.sif python my_script.py
This drastically reduces the installation size, and the number of files (down to 1), making it drastically better for the centre storage. Many conda packages aren't build compatible with CentOS6 (or even 7). In these cases, you must use singularity, or build all of them manually.
To add additional software to existing containers you can use overlays. Please see our Singularity page for more details.
If you really can't use containers¶
If you really can't create a container, you can create conda environments (these become large! Don't try to create them in your home directory), e.g.
module load Anaconda3 cd <somewhere_with_a_lot_of_space> conda create -p my_conda python=3
Activate the environment when you wish to use it:
source activate <somewhere_with_a_lot_of_space>/my_conda
(you might want to create an alias, or do this in your bashrc, as you will need to do this every time you run the program).
Install what you need, you can also add other channels, e.g:
conda config --add channels acellera conda install htmd
conda will cache files in
~/.conda/pkgs/ which are necessary.
These extra copies fill up your disk quota, and you can clear them out with the command:
conda clean --all
Please note that when running Matplotlib, you might want to run
after importing, to avoid Matplotlib trying to use the X Windows backend, which will fail if you didn't log in with X forwarding (which won't work in batch jobs).