Cuda and PyTorch

This topic should help resolving issues when working with Cuda and PyTorch

Selecting a Cuda Version

Multiple Cuda installations are present on the cluster and can be activated using the module command:

module avail
module add cuda/12.1

If you want your selected Cuda version to be the default for future login sessions then run

module save default

Installing PyTorch

The installation command for PyTorch (after creating and activating a virtual environment) is:

pip install --no-cache-dir torch torchvision torchaudio --index-url

The last part of the URL is the desired Cuda version formatted as plain concatenation of the major and minor version number and must match the one that you activate with module:

module add cuda/12.1
pip install --no-cache-dir torch torchvision torchaudio --index-url

Ubuntu comes with a default Cuda installation (11.5 for Ubuntu 22.04) so if you are not selecting a particular Cuda version then use cu115 for installing PyTorch.

Adding Additional Packages

You will need to activate the same version of Cuda that you used for installing torch when you install packages that depend on it and do not come with pre-compiled code. You may get errors like the following if you do not:

pip install --no-cache-dir spatial-correlation-sampler
The detected CUDA version (11.5) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.

Running Jobs

The module command is not available for running jobs unless you add the following to your batch file or script right after the #SBATCH lines (note the dot):

. /etc/profile.d/
module add cuda/12.1


For environments that are prepared by TAs for the JupyterHub of the cluster the right version of Cuda is already active.

If you provide your own custom environment then you can supply the modules that you want to load at server startup.

Page URL:
© 2024 Eidgenössische Technische Hochschule Zürich