Search code, repositories, users, issues, pull requests...

@cquil11 cquil11 commented Jun 18, 2026 •

edited by cursor Bot

Loading

Copy link

Copy Markdown

Collaborator

What changed

pin minimaxm3-fp8-mi325x-vllm-mtp to vllm/vllm-openai-rocm:nightly-b53b1c7ffe7aebdafd0876350f30e51d1226c92a
launch the MI325X MiniMax-M3 EAGLE3 server with --kv-cache-dtype fp8
keep the existing idempotent SupportsEagle3 compatibility guard
exclude chi-mi325x-pod2-120, which lacks the populated model cache required by the launcher
append an MI325X MTP-only performance changelog entry

Why

Use the updated ROCm vLLM build and reduce KV-cache memory use for the MI325X MiniMax-M3 EAGLE3 sweep. The node exclusion prevents Pyxis mount failures on the incomplete node.

Validation

Bash syntax validation
MTP chat-template flag verified
targeted MI325X MTP full-sweep config generation
matrix logic tests: 156 passed

Note

Medium Risk
Benchmark numbers will shift with the new image and FP8 KV; correctness depends on the nightly shipping upstream EAGLE3 support that the removed patch provided.

Overview
Updates the MI325X MiniMax-M3 EAGLE3 (MTP) benchmark path to a pinned ROCm vLLM nightly and simplifies how the server is started.

minimaxm3-fp8-mi325x-vllm-mtp now uses vllm/vllm-openai-rocm:nightly-b53b1c7ffe7aebdafd0876350f30e51d1226c92a instead of vllm/vllm-openai-rocm:minimax-m3. Config comments no longer describe a runtime SupportsEagle3 patch or BF16-only KV on gfx942.

In minimaxm3_fp8_mi325x_mtp.sh, the large idempotent Python in-place patch of vllm/models/minimax_m3/amd/model.py is removed (EAGLE3 support is expected in the new image). vllm serve gains --kv-cache-dtype fp8, and header comments are updated to match (FP8 KV for headroom vs the old BF16-on-gfx942 accuracy caveat).

launch_mi325x-amds.sh adds --exclude=chi-mi325x-pod2-120... on salloc so jobs skip a node without a populated /raid/hf-hub-cache.

perf-changelog.yaml documents the image bump, FP8 KV, patch removal, and node exclusion for this config key.

^{Reviewed by Cursor Bugbot for commit 1770e9c. Bugbot is set up for automated code reviews on this repo. Configure here.}

@cquil11 cquil11 requested a review from a team

June 18, 2026 22:43

@cquil11 cquil11 requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners

June 18, 2026 22:43

@github-project-automation github-project-automation Bot added this to InferenceMAX Board

github-actions Bot commented Jun 18, 2026

Copy link

Copy Markdown

Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

2 similar comments

github-actions Bot commented Jun 18, 2026

Copy link

Copy Markdown

Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions Bot commented Jun 18, 2026

Copy link

Copy Markdown

Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@cquil11 cquil11 force-pushed the codex/minimaxm3-mi325x-mtp-fp8-kvcache branch from 3b2ffd4 to ea746c7 Compare

June 18, 2026 22:43

@cquil11 cquil11 added the full-sweep-fail-fast label

— with ChatGPT Codex Connector

cursor[bot]

cursor Bot reviewed

View reviewed changes

@cursor cursor Bot left a comment

Copy link

Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ea746c7. Configure here.}

benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi325x_mtp.sh

vllm serve "$MODEL" --port "$PORT" \

"${PARALLEL_ARGS[@]}" \

--block-size 128 \

--kv-cache-dtype fp8 \

@cursor cursor Bot Jun 18, 2026

Copy link

Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP8 KV on gfx942

Medium Severity

This change adds --kv-cache-dtype fp8 on MI325X (gfx942), but the repo still treats that stack as lacking calibrated ROCm FP8 attention scales (fallback scale 1.0) on the same architecture in minimaxm3_fp8_mi300x_mtp.sh and minimaxm3_fp8_mi300x.sh, while the non-MTP MI325X script still omits FP8 KV. The removed comment documented that BF16 was kept deliberately to avoid corrupted accuracy.

^{Reviewed by Cursor Bugbot for commit ea746c7. Configure here.}


 perf: update MI325X MiniMax-M3 MTP image and FP8 KV cache

12dbb7a

@cquil11 cquil11 force-pushed the codex/minimaxm3-mi325x-mtp-fp8-kvcache branch from ea746c7 to 12dbb7a Compare

June 18, 2026 22:47

github-actions Bot commented Jun 18, 2026

Copy link

Copy Markdown

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27793866629
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27793866629

functionstackx

functionstackx previously requested changes

View reviewed changes

@functionstackx functionstackx left a comment

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cquil11 codex agent, plz remove the EAGLE patch now that u upgraded to nightly image, it is now included vllm-project/vllm#45546

@cquil11 cquil11 removed the full-sweep-fail-fast label

@cquil11 cquil11 added the full-sweep-fail-fast label

— with ChatGPT Codex Connector

github-actions Bot commented Jun 19, 2026

Copy link

Copy Markdown

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27794191643
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27794191643

1 similar comment

github-actions Bot commented Jun 19, 2026

Copy link

Copy Markdown

Contributor

cquil11 added 3 commits

June 18, 2026 21:28


 Merge branch 'main' into codex/minimaxm3-mi325x-mtp-fp8-kvcache

f587dd6


 fix: preserve perf changelog history

540e3c1


 fix: use upstream MI325X EAGLE support

1770e9c

@cquil11 cquil11 dismissed functionstackx’s stale review

June 19, 2026 02:33

Addressed in 1770e9c: removed the legacy in-place EAGLE3 patch and now rely on upstream vLLM #45546.

@cquil11 cquil11 requested a review from a team

June 19, 2026 02:34

cquil11 commented Jun 19, 2026

Copy link

Copy Markdown

Collaborator Author

@functionstackx Removed the legacy in-place EAGLE3 patch in 1770e9c and now rely on upstream vLLM #45546. The rerun is green and MI325X MTP averages +3.1% versus the latest official baseline. Could you re-review/approve?

@cquil11 cquil11 enabled auto-merge (squash)

June 19, 2026 02:34

@cquil11 cquil11 disabled auto-merge

June 19, 2026 14:39


 Merge branch 'main' into codex/minimaxm3-mi325x-mtp-fp8-kvcache

02a53ac

@cquil11 cquil11 added full-sweep-fail-fast and removed full-sweep-fail-fast labels

Jun 19, 2026

github-actions Bot commented Jun 19, 2026

Copy link

Copy Markdown

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27836183445
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27836183445

1 similar comment