[CANN] Any possibility for the Unsloth dynamic quant of R1 to work on Ascend cards? · ggml-org/llama.cpp · Discussion #11698

Dango233
Feb 6, 2025

Ascends NPUs seems to be a great alternative (to Macstudio and epyc) to run quantized R1.
For example: Atlas 300I Duo offers 140TFLOPS fp16 408GB/s mem bandwidth + 96G Vram.
2 of this card onto a PC could run the quantized 671B R1 relatively well I would say.

However, as shown in https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/CANN.md, there is no deepseek architecture support yet, and low bit quantization seems to be not validated yet.

@hipudding Do you have plan on porting low-bit quantized R1 to Ascend cards, via gguf-cann backend?

That seems a pretty valid use case to me...

Replies: 2 comments

cecuca
Oct 9, 2025

@hipudding is there something we can help you with to make it happen?

0 replies

hipudding
Oct 16, 2025
Collaborator

Thank you for your interest in Ascend.
Currently, the best-supported format for running llama.cpp on Ascend is FP16, with partial support for Q8_0 and Q4_0 on certain device.
However, based on actual testing, the execution efficiency of quantized operators is not very high — in some cases, even lower than FP16. In addition, hardware support for 4-bit or lower-bit quantization is not yet available for all devices.

If you want to enable quantized formats, I believe q8(q8_0,q8_1,q8_k_m) and q4(q4_0,q4_1,q4_k_m) are feasible. It would only require implementing the quantized versions of GGML_OP_GET_ROWS, GGML_OP_MUL_MAT, and GGML_OP_MUL_MAT_ID.
We are also looking forward to seeing excellent inference performance on quantized models with Ascend in the future.

0 replies

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CANN] Any possibility for the Unsloth dynamic quant of R1 to work on Ascend cards? #11698

Uh oh!

{{title}}

Uh oh!

Dango233
Feb 6, 2025

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

cecuca
Oct 9, 2025

Uh oh!

{{title}}

Uh oh!

hipudding
Oct 16, 2025
Collaborator

Select a reply

Uh oh!

[CANN] Any possibility for the Unsloth dynamic quant of R1 to work on Ascend cards? #11698

Uh oh!

Dango233 Feb 6, 2025

Replies: 2 comments

Uh oh!

cecuca Oct 9, 2025

Uh oh!

hipudding Oct 16, 2025 Collaborator

Dango233
Feb 6, 2025

cecuca
Oct 9, 2025

hipudding
Oct 16, 2025
Collaborator