-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Pull requests: sgl-project/sglang
Pull requests list
[Test] support llm-compressor: w8a8_fp8_block, wNa16
#11701
opened Oct 16, 2025 by
Wangzheee
Loading...
4 tasks
[quantization] AWQ Marlin doesn't work when dtype is bfloat16
express-lane
A PR may be merged without a full CI check
run-ci
#11494
opened Oct 12, 2025 by
kevin85421
Loading...
4 tasks
[6/n]decouple quantization implementation from vLLM dependency
ready-to-merge
The PR is ready to merge after the CI is green.
run-ci
#10750
opened Sep 22, 2025 by
Hongbosherlock
Loading...
4 tasks
integrate autoRound quantization algorithm
high priority
intel
run-ci
#10153
opened Sep 8, 2025 by
WeiweiZhang1
Loading...
feat: Add FP4 (E2M1) KV Cache Support with Quantization Utilities for MLA
high priority
quant
LLM Quantization
run-ci
#10078
opened Sep 5, 2025 by
JackChuang
Loading...
4 tasks done
[2/N][Bug] Fix w4afp8 MoE NaN issue (python)
high priority
#9918
opened Sep 2, 2025 by
yuhyao
Loading...
3 of 4 tasks
Support w4a16 quantization for CompressedTensorsMoEMethod
#9248
opened Aug 16, 2025 by
RaymondWang0
Loading...
4 tasks
Fix W8A8Int8Config init error
high priority
#8014
opened Jul 14, 2025 by
lambert0312
Loading...
1 of 6 tasks
[Kernel] feat: add flexprefill for long-context
high priority
#6867
opened Jun 4, 2025 by
artetaout
Loading...
4 of 6 tasks
Fix channel-wise INT8 moe config tuning error
high priority
#5872
opened Apr 29, 2025 by
lambert0312
Loading...
2 of 6 tasks
ProTip!
Mix and match filters to narrow down what you’re looking for.
You can’t perform that action at this time.