0

I am trying to get this code to work and then use it to train various models on two gpu's:

from dask_cuda import LocalCUDACluster
from dask.distributed import Client
if __name__ == "__main__":
 with LocalCUDACluster(n_workers=2) as cluster:
 with Client(cluster) as client:
 ...

The error start with - distributed.worker - ERROR - Worker plugin CPUAffinity-bd2c98dd-df91-4dc5-8dd9-cca207e4c3fc failed to setup

I am on the server provided by the university and it keeps giving me an error on the cpu affinity setting. If I set only one worker it works normally. Even if I use LocalCluster, though using that I have the problem that I will have two workers on one gpu.

talonmies
72.8k35 gold badges204 silver badges296 bronze badges
asked Dec 12, 2024 at 16:50
2
  • How do you access the server, and how is it configured? What kind of GPUs? Commented Dec 17, 2024 at 9:22
  • I use ssh to access the server that has ubuntu 20.04 with 2 gpus nvidia v100(cuda version 12.3). And i have an interactive mode access so I don't need to use slurm Commented Dec 18, 2024 at 8:58

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.