-
Notifications
You must be signed in to change notification settings - Fork 571
Description
I have been struggling for a while with dividing specific layers or tensors between two GPUs and a CPU. Is there a way to use Override Tensors to specify tensors to offload on two different GPUs? I tried moving layers 30-39 to CUDA0 and 40-49 to CUDA1 like this:
\.(3[0-9])\.*=CUDA0,\.(4[0-9])\.*=CUDA1
At first it looks like it should work:
Handling Override Tensors for backends: CUDA0 CUDA1 CPU
Override Tensor: \.(3[0-9])\.* to CUDA0
Override Tensor: \.(4[0-9])\.* to CUDA1
But in the end Koboldcpp only uses the last override command, i.e. 40-49 to CUDA1 and ignores the first one.
I also tried the opposite by setting GPU Layers to 99 and overriding specific layers to CPU, but then Koboldcpp ignores Tensor Split tettings and only uses one GPU.
Being able to control individual tensors is especially important when using MoE models. Setting MoE CPU Layers works fine with one GPU, but the Tensor Split settings are again ignored.
This is on Win10, i5-13600KF, RTX 4090 & RTX 3080 Ti, Kobold version 1.99.4.