Troubleshooting

This topic lists the usual problems users have and how to fix them.

I cannot log in

Access to the cluster is only enabled for students participating in a course that uses the cluster as well as all TAs of such a course. Access is revoked on the last Monday of the semester holidays after a course.

Access to student-jupyter.inf.ethz.ch is furthermore restricted to students participating in a course that runs Jupyter notebooks.

My job is not starting

Run squeue to check the queue. If in the right-most column it lists a node name (studgpu-node??) then you will need to wait up to five minutes until that node is powered up.

Instead of the node name you may also get a status code:

(QOSMaxGRESPerUser)
You have requested too much RAM, too many GPUs or cpu cores.
(Resources) or (QOSGrpGRESMinutes)
You have not properly set the number of GPUs, course name and runtime. See here for how to properly start a job. Try to run the examples, they should always work.

My job got canceled

This only affects courses that have long-running jobs. These jobs get canceled when the cluster if full and users of other courses start more short jobs or jupyter notebooks. Interrupted jobs will be automatically restarted if the cluster has less load.

I am out of GPU time for a course

Each user only gets a fixed amount to finish a course. If that is not enough then please contact a TA to find a solution.

My home directory is full

You have 10GB of space and will need to get by with this for all courses.

One place where space is usually wasted is the pip cache. Run these two commands to get its size and purge the cache:

python3 -m pip cache info
python3 -m pip cache purge

Some software that I need is not installed

Write to support@inf.ethz.ch and let us know what you are missing and why you need it. We'll have a look at your request and install the necessary packages on all login nodes and GPU nodes if it is not too complicated.


Page URL: https://www.isg.inf.ethz.ch/bin/view/Main/HelpClusterComputingStudentClusterTroubleshooting
2023-11-28
© 2023 Eidgenössische Technische Hochschule Zürich