[1/N] add fp8 fp32 scale support for custom RL model#368

Open

yiakwy-xpu-ml-framework-team wants to merge 2 commits into

antirez:main from

yiakwy-xpu-ml-framework-team:add_fp8_fp32_scale_support

Open

[1/N] add fp8 fp32 scale support for custom RL model #368
yiakwy-xpu-ml-framework-team wants to merge 2 commits into
antirez:main from
yiakwy-xpu-ml-framework-team:add_fp8_fp32_scale_support

Conversation

@yiakwy-xpu-ml-framework-team

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team commented Jun 9, 2026 •

edited

Loading

Copy link

Copy Markdown

Background

We added fp8 RL+SFT version of Deepseek V4 in week 0 support and suppressed DeepSeek V4 baseline in all major dimensions from our internal evaluation.

Hence we want to add 2 bit support for DeepSeek V4 with our Expert Pruning technology:
截屏2026年06月09日 15 40 50

Noted, in H100/H800, we usually don't use E8M0 for scale, since it will introduce runtime overhead. FP32 scale is the best.

@yiakwy-xpu-ml-framework-team


 add fp8 fp32 scale support for custom RL model

4decca9

@yiakwy-xpu-ml-framework-team

yiakwy-xpu-ml-framework-team commented Jun 9, 2026

Copy link

Copy Markdown

Author

@antirez could you have a look at it ?

@antirez

antirez commented Jun 9, 2026

Copy link

Copy Markdown

Owner

Hi, the PR itself has a few quality issues but especially it is not clear why it would be useful for the proejct as a whole given that we convert from DS4 hugging face formats.

@yiakwy-xpu-ml-framework-team

yiakwy-xpu-ml-framework-team commented Jun 10, 2026 •

edited

Loading

Copy link

Copy Markdown

Author

截屏2026年06月10日 09 14 48

Quantization is successful.

@antirez Thank you for the quick response, let me explain.

our sft/RL model of deepseek v4 has embedding layer (bf16 or int32), while deepseek model has embedding with type int64
since we are running in Hopper platform , our expert weight stored with E4M3 FP8 weight and weight scale stored with FP32 for best performance (which can verified in SGLang):

sglang fp8 serving
Customer DSV4 sglang fp8 serving in Hopper platform with identity injectioin, private/public knowledge injection and enhanced security shield module
Huggingface model is not SGLang compatible version, while our version is; and huggingface does not consider convert SFT/RL model from Bf16 to FP8 variants

The model is tuned specifically to handle Candonese, Chinese madarin and English efficiently.

This was referenced Jun 10, 2026

[2/N] add cuda imatrix support for custom RL model #377

Open

Distributed CUDA worker host-registers the whole GGUF, not just its layer slice → Q4 OOM on 128GB DGX Spark #293

Open

@yiakwy-xpu-ml-framework-team


 our sft/rl model does not contain fp8 scale for this weight

fe08b54

@yiakwy-xpu-ml-framework-team

yiakwy-xpu-ml-framework-team commented Jun 11, 2026 •

edited

Loading

Copy link

Copy Markdown

Author

Hi @antirez I can make sure the modification can generate correct model checkpoint for ds4. Wish your attention.

GB10 (2 bit dsv4-sft-rl, 15 toks/sec) :

截屏2026年06月12日 17 54 44

Serving with raw model (no system prompt):

2135b13f7dc854cf793c7d7e849dc316

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team mentioned this pull request

Jun 12, 2026

[3/N] add prefetch support for CUDA backend : running ds4 for any GPU with cache (2.75 x faster!) #402

Open

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1/N] add fp8 fp32 scale support for custom RL model#368

[1/N] add fp8 fp32 scale support for custom RL model #368
yiakwy-xpu-ml-framework-team wants to merge 2 commits into
antirez:main from
yiakwy-xpu-ml-framework-team:add_fp8_fp32_scale_support

Conversation

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team commented Jun 9, 2026 •

edited

Loading

Uh oh!

Background

Uh oh!

yiakwy-xpu-ml-framework-team commented Jun 9, 2026

Uh oh!

antirez commented Jun 9, 2026

Uh oh!

yiakwy-xpu-ml-framework-team commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

yiakwy-xpu-ml-framework-team commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Uh oh!

yiakwy-xpu-ml-framework-team commented Jun 9, 2026

Uh oh!

antirez commented Jun 9, 2026

Uh oh!

yiakwy-xpu-ml-framework-team commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiakwy-xpu-ml-framework-team commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team commented Jun 9, 2026 •

edited

Loading

yiakwy-xpu-ml-framework-team commented Jun 10, 2026 •

edited

Loading

yiakwy-xpu-ml-framework-team commented Jun 11, 2026 •

edited

Loading