Qwen2.5VL 3B calibration using awq takes too long #433

Open

Description

opened

on Aug 11, 2025

For a 1k data set, when batchsize=16, the duration is >13h. But llm-compressor awq calibration takes <1h. May I ask what is the reason?
I think one possible reason is that the performance of qwen2.5vl attn backend using sdpa is slower. Are there other factors?

Hardware platform: H20 * 1
input token numbers (text+image) is ≈ 200

awq yaml config:

base:
 seed: &seed 42
model:
 type: Qwen2_5VL
 path: xxx
 tokenizer_mode: slow
 torch_dtype: auto
calib:
 name: custom_mm
 download: False
 path: xxx
 apply_chat_template: True
 n_samples: 960
 bs: 16
 seq_len: 512
 padding: True
 seed: *seed
quant:
 method: Awq
 weight:
 bit: 4
 symmetric: False
 granularity: per_group
 group_size: 64
 # Available options: ['gemm_pack']
 pack_version: gemm_pack
 special:
 trans: True
 trans_version: v2
 weight_clip: True
 do_gqa_trans: True
 quant_out: False
save:
 save_mlcllm: True
 save_fake: True
 save_path: xxx

Metadata

Assignees

No one assigned

Labels

No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2.5VL 3B calibration using awq takes too long #433

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions