Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[CANN] Any possibility for the Unsloth dynamic quant of R1 to work on Ascend cards? #11698

Dango233 started this conversation in Ideas
Discussion options

Ascends NPUs seems to be a great alternative (to Macstudio and epyc) to run quantized R1.
For example: Atlas 300I Duo offers 140TFLOPS fp16 408GB/s mem bandwidth + 96G Vram.
2 of this card onto a PC could run the quantized 671B R1 relatively well I would say.

However, as shown in https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/CANN.md, there is no deepseek architecture support yet, and low bit quantization seems to be not validated yet.

@hipudding Do you have plan on porting low-bit quantized R1 to Ascend cards, via gguf-cann backend?

That seems a pretty valid use case to me...

You must be logged in to vote

Replies: 2 comments

Comment options

@hipudding is there something we can help you with to make it happen?

You must be logged in to vote
0 replies
Comment options

Thank you for your interest in Ascend.
Currently, the best-supported format for running llama.cpp on Ascend is FP16, with partial support for Q8_0 and Q4_0 on certain device.
However, based on actual testing, the execution efficiency of quantized operators is not very high — in some cases, even lower than FP16. In addition, hardware support for 4-bit or lower-bit quantization is not yet available for all devices.

If you want to enable quantized formats, I believe q8(q8_0,q8_1,q8_k_m) and q4(q4_0,q4_1,q4_k_m) are feasible. It would only require implementing the quantized versions of GGML_OP_GET_ROWS, GGML_OP_MUL_MAT, and GGML_OP_MUL_MAT_ID.
We are also looking forward to seeing excellent inference performance on quantized models with Ascend in the future.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Ideas
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /