-
Notifications
You must be signed in to change notification settings - Fork 650
Pull requests: InternLM/lmdeploy
Pull requests list
[Add] add Qwen3-8B accuracy evaluation in llm_compressor.md
#4319
opened Feb 3, 2026 by
43758726
Loading...
Negative KV sequence length error in Attention op
#4316
opened Feb 2, 2026 by
jinminxi104
Loading...
Compatible with transformers 5.0 at TurboMind side
improvement
#4304
opened Jan 28, 2026 by
lvhan028
Loading...
fix rotary embedding for transformers v5
improvement
#4303
opened Jan 28, 2026 by
grimoire
Loading...
change ascend paged attention from BSH format to TND format for better performace
#4295
opened Jan 27, 2026 by
jinminxi104
•
Draft
Support ignore layers in quant config for qwen3 models
improvement
#4293
opened Jan 26, 2026 by
RunningLeon
Loading...
feat: implement online bf16-to-fp8 conversion and inference in TurboMind
improvement
#4237
opened Dec 25, 2025 by
43758726
Loading...
Support fp32 head for qwen and internlm models
improvement
#4160
opened Nov 27, 2025 by
RunningLeon
Loading...
Add step_map to track token decoding order in DLLM
#4057
opened Oct 21, 2025 by
Auraithm
Loading...
4 tasks done
quant blocked fp8
enhancement
New feature or request
#4018
opened Sep 29, 2025 by
CUHKSZzxy
Loading...
4 of 5 tasks
add ppu quick start doc
documentation
Improvements or additions to documentation
#3841
opened Aug 14, 2025 by
guozixu2001
Loading...
ProTip!
no:milestone will show everything without a milestone.
You can’t perform that action at this time.