Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Qwen2.5VL 3B calibration using awq takes too long #433

Open
@kritohyh

Description

For a 1k data set, when batchsize=16, the duration is >13h. But llm-compressor awq calibration takes <1h. May I ask what is the reason?
I think one possible reason is that the performance of qwen2.5vl attn backend using sdpa is slower. Are there other factors?

Hardware platform: H20 * 1
input token numbers (text+image) is ≈ 200

awq yaml config:

base:
 seed: &seed 42
model:
 type: Qwen2_5VL
 path: xxx
 tokenizer_mode: slow
 torch_dtype: auto
calib:
 name: custom_mm
 download: False
 path: xxx
 apply_chat_template: True
 n_samples: 960
 bs: 16
 seq_len: 512
 padding: True
 seed: *seed
quant:
 method: Awq
 weight:
 bit: 4
 symmetric: False
 granularity: per_group
 group_size: 64
 # Available options: ['gemm_pack']
 pack_version: gemm_pack
 special:
 trans: True
 trans_version: v2
 weight_clip: True
 do_gqa_trans: True
 quant_out: False
save:
 save_mlcllm: True
 save_fake: True
 save_path: xxx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

      Relationships

      None yet

      Development

      No branches or pull requests

      Issue actions

        AltStyle によって変換されたページ (->オリジナル) /