Any hope for DeepSeek v3 MTP support? · ggml-org/llama.cpp · Discussion #11455

Green0-0
Jan 27, 2025

"Based on our evaluation, the acceptance rate of the second token prediction ranges between 85% and 90% across various generation topics, demonstrating consistent reliability. This high acceptance rate enables DeepSeek-V3 to achieve a significantly improved decoding speed, delivering 1.8 times TPS (Tokens Per Second)."
(The DeepSeek v3 report)

Replies: 2 comments

BarfingLemurs
Jan 28, 2025

It may also be worth looking at DeepseekVL2 models which share the same vocabulary as DeepseekV3.

This one, maybe then it could be offloaded to the gpu?

0 replies

lippman1125
Oct 22, 2025

It is weird that llama.cpp does not support MTP yet! In the future, most models would be equipped with MTP.

0 replies

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Any hope for DeepSeek v3 MTP support? #11455

Uh oh!

{{title}}

Uh oh!

Green0-0
Jan 27, 2025

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

BarfingLemurs
Jan 28, 2025

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

lippman1125
Oct 22, 2025

Select a reply

Uh oh!

Uh oh!

Any hope for DeepSeek v3 MTP support? #11455

Uh oh!

Green0-0 Jan 27, 2025

Replies: 2 comments

Uh oh!

BarfingLemurs Jan 28, 2025

Uh oh!

Uh oh!

lippman1125 Oct 22, 2025

Green0-0
Jan 27, 2025

BarfingLemurs
Jan 28, 2025

lippman1125
Oct 22, 2025