-
Notifications
You must be signed in to change notification settings - Fork 19
Releases: modelscope/mcore-bridge
Releases · modelscope/mcore-bridge
v1.4.3
新特性
- 新增 model_type 支持:gemma4_unified;kimi_k25 新增多模态支持。
- 新增 language_model_only 参数,启用后仅创建语言模型部分,并只加载与保存语言模型相关权重。
- 修复若干 Bug。
New Features
- Added model_type support for gemma4_unified; added multimodal support for kimi_k25.
- Added language_model_only parameter, which when enabled, only creates the language model component and exclusively loads/saves language model weights.
- Fixed several bugs.
What's Changed
- [bugfix] fix: clamp num_tokens=0 in MTP loss & add normalized scale for MTP per token loss by @YaoweiFan in #104
- [bugfix] fix tie_word_embeddings by @Jintao-Huang in #105
- [bugfix] fix deepseek-v4 dev branch by @Jintao-Huang in #107
- [model] support gemma4_unified by @Jintao-Huang in #108
- update batch_p2p_comm by @Jintao-Huang in #111
- support language_model_only by @Jintao-Huang in #112
- support kimi_k25 mm by @Jintao-Huang in #113
- update mla rope mcore>=0.18 (0.15-0.18 compat) by @Jintao-Huang in #114
New Contributors
- @YaoweiFan made their first contribution in #104
Full Changelog: v1.4.2...v1.4.3
Assets 2
v1.4.2
新特性
- 新增 model_type 支持:bailing_hybrid。
- 修复 olmoe/bailing_moe 在TP > 1时的损失异常。
New Features
- Add model_type support: bailing_hybrid.
- Fix abnormal loss for olmoe/bailing_moe when TP > 1.
What's Changed
- [bugfix] fix bug by @Jintao-Huang in #99
- [bugfix] fix qwen3_next norm sp by @Jintao-Huang in #100
- [model] Support bailing_hybrid by @Jintao-Huang in #85
- refactor olmoe by @Jintao-Huang in #101
- [bugfix] fix npu GDN by @Jintao-Huang in #103
Full Changelog: v1.4.1...v1.4.2
Assets 2
v1.4.1
中文版
新特性
- 新增 model_type 支持:gemma4、deepseek_v4。
- README 新增使用 Mcore-Bridge 创建模型并执行 forward、计算损失的最简示例。
- 兼容 megatron-core main 与 dev 分支。
English Version
New Features
- Added model_type support for: gemma4, deepseek_v4.
- Added a minimal example in README demonstrating how to create a model using Mcore-Bridge to perform forward pass and compute loss.
- Compatible with both megatron-core main and dev branches.
What's Changed
- [model] Support gemma4 by @Jintao-Huang in #56
- [docs] update readme by @Jintao-Huang in #84
- compat megatron dev branch by @Jintao-Huang in #87
- [model] support gemma4 padding_free by @Jintao-Huang in #88
- [docs] update docs by @Jintao-Huang in #89
- update gemma4 rope by @Jintao-Huang in #90
- refactor MLA by @Jintao-Huang in #91
- compat mtp megatron_core main branch by @Jintao-Huang in #92
- [model] Support deepseek-v4 by @Jintao-Huang in #86
- [bugfix] fix bugs by @Jintao-Huang in #95
- [model] support deepseek v4 mtp by @Jintao-Huang in #93
- Support fp4 blockwise load by @Jintao-Huang in #96
- [bugfix] fix gdn conv1d by @Jintao-Huang in #97
- update lora add by @Jintao-Huang in #98
Full Changelog: v1.4.0...v1.4.1
Assets 2
v1.4.0
中文版
新特性
- 新增
model_type支持:bailing_moe、qwen3_asr。 - 支持 Qwen3-Next 以 Mcore-GDN 方式运行(默认),从而支持序列 packing、FP8 及 CP。
- 对
transformer_block/transformer_layer进行重构,通过可继承的方式便于新模型的接入。 - 兼容 Python 3.13。
- 支持 transformers 中以 grouped 方式组织专家的 MoE 模型的 LoRA 权重存储与读取。(注意:该 LoRA 权重不支持通过 transformers 直接加载,但可通过 Megatron 加载以用于后续继续训练。)
- 新增
padding_mask支持,修复了在padding_free=False时,moe_aux_loss对 padding token 错误计算 routing loss 的问题。
English Version
New Features
- Added
model_typesupport forbailing_moeandqwen3_asr. - Support running Qwen3-Next with Mcore-GDN (default), enabling sequence packing, FP8, and CP.
- Refactored
transformer_block/transformer_layerwith an inheritable design to simplify the integration of new models. - Added compatibility with Python 3.13.
- Support LoRA weight saving and loading for MoE models whose experts are organized in grouped mode in transformers. (Note: these LoRA weights cannot be loaded directly via transformers, but can be loaded via Megatron for continued training.)
- Added
padding_masksupport, fixing an issue wheremoe_aux_lossincorrectly computed routing loss on padding tokens whenpadding_free=False.
What's Changed
- [bugfix] fix MTP & mcore 0.15 (NPU) by @Jintao-Huang in #67
- compat python 3.13 by @Jintao-Huang in #68
- compat lint py313 by @Jintao-Huang in #69
- compat lint py3.13 by @Jintao-Huang in #70
- [model] support bailing by @Jintao-Huang in #55
- update gpt_model by @Jintao-Huang in #71
- refactor transformer_block by @Jintao-Huang in #72
- [bugfix] fix tie_word_embeddings by @Jintao-Huang in #74
- [bugfix] fix qwen3_vl by @Jintao-Huang in #73
- remove hf_grouped lora error by @Jintao-Huang in #75
- [model] support qwen3_next gdn by @Jintao-Huang in #76
- compat megatron.core 0.18 by @Jintao-Huang in #77
- [model] support qwen3_asr by @Jintao-Huang in #78
- Support padding mask by @Jintao-Huang in #79
- compat peft 0.19 by @Jintao-Huang in #80
- [readme] Update readme by @Jintao-Huang in #81
- [docs] update readme by @Jintao-Huang in #82
- [bugfix] fix minimax qk_norm sp by @Jintao-Huang in #83
Full Changelog: v1.3.0...v1.4.0
Assets 2
Patch release v1.3.2
Full Changelog: v1.3.1...v1.3.2
Assets 2
Patch release v1.3.1
Full Changelog: v1.3.0...v1.3.1
Assets 2
v1.3.0
中文版
新特性
- 新增 model_type 支持:kimi_k25、hy_v3、llava_onevision。
- mlp_padding_free 兼容 Sequence Parallelism。
- 移除对 megatron-core 0.12 - 0.14 版本的依赖支持。
English Version
New Features
- Added model_type support: kimi_k25, hy_v3, llava_onevision.
- mlp_padding_free is now compatible with Sequence Parallelism.
- Removed dependency support for megatron-core versions 0.12 - 0.14.
What's Changed
- [docs] update readme by @Jintao-Huang in #49
- update requirements by @Jintao-Huang in #51
- npu qwen3.5 megatron padding_free fix by @addsubmuldiv in #50
- [model] support kimi_k25 by @Jintao-Huang in #52
- [model] support hy_v3 by @Jintao-Huang in #53
- Add support for LLaVA-OneVision-1.5 model by @randydl in #54
- [bugfix] fix torch_dtype by @Jintao-Huang in #57
- fix qwen3_next by @Jintao-Huang in #58
- remove mcore0.12-mcore0.14 by @Jintao-Huang in #59
- fix kwargs by @Jintao-Huang in #61
- [megatron] support mlp_padding_free & sp; refactor TransformerLayer by @Jintao-Huang in #62
- [bugfix] fix gather_from_sp by @Jintao-Huang in #63
- update transformers by @Jintao-Huang in #65
- update requirements by @Jintao-Huang in #66
New Contributors
Full Changelog: v1.2.0...v1.3.0
Assets 2
Patch release v1.2.3
Full Changelog: v1.2.2...v1.2.3
Assets 2
Patch release v1.2.2
Full Changelog: v1.2.1...v1.2.2
Assets 2
Patch release v1.2.1
Full Changelog: v1.2.0...v1.2.1