I am trying to get this code to work and then use it to train various models on two gpu's:
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
if __name__ == "__main__":
with LocalCUDACluster(n_workers=2) as cluster:
with Client(cluster) as client:
...
The error start with
- distributed.worker - ERROR - Worker plugin CPUAffinity-bd2c98dd-df91-4dc5-8dd9-cca207e4c3fc failed to setup
I am on the server provided by the university and it keeps giving me an error on the cpu affinity setting. If I set only one worker it works normally. Even if I use LocalCluster, though using that I have the problem that I will have two workers on one gpu.
talonmies
72.8k35 gold badges204 silver badges296 bronze badges
-
How do you access the server, and how is it configured? What kind of GPUs?Guillaume EB– Guillaume EB2024年12月17日 09:22:08 +00:00Commented Dec 17, 2024 at 9:22
-
I use ssh to access the server that has ubuntu 20.04 with 2 gpus nvidia v100(cuda version 12.3). And i have an interactive mode access so I don't need to use slurmDanilo Caputo– Danilo Caputo2024年12月18日 08:58:14 +00:00Commented Dec 18, 2024 at 8:58