-
Couldn't load subscription status.
- Fork 13.4k
-
"Based on our evaluation, the acceptance rate of the second token prediction ranges between 85% and 90% across various generation topics, demonstrating consistent reliability. This high acceptance rate enables DeepSeek-V3 to achieve a significantly improved decoding speed, delivering 1.8 times TPS (Tokens Per Second)."
(The DeepSeek v3 report)
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 2 comments
-
It may also be worth looking at DeepseekVL2 models which share the same vocabulary as DeepseekV3.
This one, maybe then it could be offloaded to the gpu?
Beta Was this translation helpful? Give feedback.
All reactions
-
It is weird that llama.cpp does not support MTP yet! In the future, most models would be equipped with MTP.
Beta Was this translation helpful? Give feedback.