Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Klaud Cold] MI325X MiniMax-M3 EAGLE3 nightly image and FP8 KV cache#1838

Open
cquil11 wants to merge 5 commits into
main from
codex/minimaxm3-mi325x-mtp-fp8-kvcache
Open

[Klaud Cold] MI325X MiniMax-M3 EAGLE3 nightly image and FP8 KV cache #1838
cquil11 wants to merge 5 commits into
main from
codex/minimaxm3-mi325x-mtp-fp8-kvcache

Conversation

@cquil11

@cquil11 cquil11 commented Jun 18, 2026
edited by cursor Bot
Loading

Copy link
Copy Markdown
Collaborator

What changed

  • pin minimaxm3-fp8-mi325x-vllm-mtp to vllm/vllm-openai-rocm:nightly-b53b1c7ffe7aebdafd0876350f30e51d1226c92a
  • launch the MI325X MiniMax-M3 EAGLE3 server with --kv-cache-dtype fp8
  • keep the existing idempotent SupportsEagle3 compatibility guard
  • exclude chi-mi325x-pod2-120, which lacks the populated model cache required by the launcher
  • append an MI325X MTP-only performance changelog entry

Why

Use the updated ROCm vLLM build and reduce KV-cache memory use for the MI325X MiniMax-M3 EAGLE3 sweep. The node exclusion prevents Pyxis mount failures on the incomplete node.

Validation

  • Bash syntax validation
  • MTP chat-template flag verified
  • targeted MI325X MTP full-sweep config generation
  • matrix logic tests: 156 passed

Note

Medium Risk
Benchmark numbers will shift with the new image and FP8 KV; correctness depends on the nightly shipping upstream EAGLE3 support that the removed patch provided.

Overview
Updates the MI325X MiniMax-M3 EAGLE3 (MTP) benchmark path to a pinned ROCm vLLM nightly and simplifies how the server is started.

minimaxm3-fp8-mi325x-vllm-mtp now uses vllm/vllm-openai-rocm:nightly-b53b1c7ffe7aebdafd0876350f30e51d1226c92a instead of vllm/vllm-openai-rocm:minimax-m3. Config comments no longer describe a runtime SupportsEagle3 patch or BF16-only KV on gfx942.

In minimaxm3_fp8_mi325x_mtp.sh, the large idempotent Python in-place patch of vllm/models/minimax_m3/amd/model.py is removed (EAGLE3 support is expected in the new image). vllm serve gains --kv-cache-dtype fp8, and header comments are updated to match (FP8 KV for headroom vs the old BF16-on-gfx942 accuracy caveat).

launch_mi325x-amds.sh adds --exclude=chi-mi325x-pod2-120... on salloc so jobs skip a node without a populated /raid/hf-hub-cache.

perf-changelog.yaml documents the image bump, FP8 KV, patch removal, and node exclusion for this config key.

Reviewed by Cursor Bugbot for commit 1770e9c. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

2 similar comments

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ea746c7. Configure here.

vllm serve "$MODEL" --port "$PORT" \
"${PARALLEL_ARGS[@]}" \
--block-size 128 \
--kv-cache-dtype fp8 \

@cursor cursor Bot Jun 18, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP8 KV on gfx942

Medium Severity

This change adds --kv-cache-dtype fp8 on MI325X (gfx942), but the repo still treats that stack as lacking calibrated ROCm FP8 attention scales (fallback scale 1.0) on the same architecture in minimaxm3_fp8_mi300x_mtp.sh and minimaxm3_fp8_mi300x.sh, while the non-MTP MI325X script still omits FP8 KV. The removed comment documented that BF16 was kept deliberately to avoid corrupted accuracy.

Reviewed by Cursor Bugbot for commit ea746c7. Configure here.

@cquil11 cquil11 force-pushed the codex/minimaxm3-mi325x-mtp-fp8-kvcache branch from ea746c7 to 12dbb7a Compare June 18, 2026 22:47

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cquil11 codex agent, plz remove the EAGLE patch now that u upgraded to nightly image, it is now included vllm-project/vllm#45546

Copy link
Copy Markdown
Contributor

1 similar comment

Copy link
Copy Markdown
Contributor

@cquil11 cquil11 dismissed functionstackx’s stale review June 19, 2026 02:33

Addressed in 1770e9c: removed the legacy in-place EAGLE3 patch and now rely on upstream vLLM #45546.

@cquil11 cquil11 requested a review from a team June 19, 2026 02:34

cquil11 commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator Author

@functionstackx Removed the legacy in-place EAGLE3 patch in 1770e9c and now rely on upstream vLLM #45546. The rerun is green and MI325X MTP averages +3.1% versus the latest official baseline. Could you re-review/approve?

@cquil11 cquil11 enabled auto-merge (squash) June 19, 2026 02:34
@cquil11 cquil11 disabled auto-merge June 19, 2026 14:39

Copy link
Copy Markdown
Contributor

1 similar comment

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@cursor cursor[bot] cursor[bot] left review comments

@functionstackx functionstackx functionstackx left review comments

@billishyahao billishyahao Awaiting requested review from billishyahao billishyahao is a code owner

@chunfangamd chunfangamd Awaiting requested review from chunfangamd chunfangamd is a code owner

@seungrokj seungrokj Awaiting requested review from seungrokj seungrokj is a code owner

@yctseng0211 yctseng0211 Awaiting requested review from yctseng0211 yctseng0211 is a code owner

@1am9trash 1am9trash Awaiting requested review from 1am9trash 1am9trash is a code owner

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Projects

Status: No status

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /