I've made over 20 tests and GPTQv2 is always worse than BitsandBytes 4bit even the 8bit GTPQv2 is worse. · ModelCloud/GPTQModel · Discussion #1608

TByte007
May 9, 2025

I comparing GPTQv2 vs BnB using vLLM. For BnB I just use the on the fly quantization. I tried all kinds of combination of options for GPTQv2 and it always comes out worse at accuracy. Even the 8bit is worse then 4bit BnB.
I tested it using "anli" and "hellaswag" on "Nitral-AI/Violet_Magcap-12B" model.
This is part of the results:
ANLI:

GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 128, damp_percent: 0.1, mse: false
9600/9600 [07:25<00:00, 21.55it/s]
2025年05月09日:04:47:54
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1| 1|none | 0|acc |↑ |0.4830|± |0.0158|
|anli_r2| 1|none | 0|acc |↑ |0.4710|± |0.0158|
|anli_r3| 1|none | 0|acc |↑ |0.4583|± |0.0144|
GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1, mse: true
9600/9600 [08:39<00:00, 18.47it/s]
2025年05月09日:05:10:15
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1| 1|none | 0|acc |↑ |0.5160|± |0.0158|
|anli_r2| 1|none | 0|acc |↑ |0.4800|± |0.0158|
|anli_r3| 1|none | 0|acc |↑ |0.4858|± |0.0144|
Bitsandbytes 4bit
9600/9600 [07:27<00:00, 21.45it/s]
2025年05月09日:04:58:01
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1| 1|none | 0|acc |↑ |0.5220|± |0.0158|
|anli_r2| 1|none | 0|acc |↑ |0.5050|± |0.0158|
|anli_r3| 1|none | 0|acc |↑ |0.5017|± |0.0144|

Hellswag:

BitsAndBytes 4bit
40168/40168 [31:39<00:00, 21.15it/s]
2025年05月02日:05:24:22
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=20,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6262|± |0.0048|
| | |none | 0|acc_norm|↑ |0.8091|± |0.0039|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText
40168/40168 [27:57<00:00, 23.94it/s]
2025年05月08日:14:41:55
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6061|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7971|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64
40168/40168 [28:17<00:00, 23.66it/s]
2025年05月08日:19:10:18
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6067|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7977|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: -1
40168/40168 [21:51<00:00, 30.64it/s]
2025年05月08日:20:30:24
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6072|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7973|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1
40168/40168 [28:09<00:00, 23.78it/s]
2025年05月08日:22:01:39
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6061|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7976|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1, mse: true
40168/40168 [28:15<00:00, 23.69it/s]
2025年05月09日:02:34:33
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6062|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7976|± |0.0040|
GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 128, damp_percent: 0.1, mse: false
40168/40168 [24:10<00:00, 27.69it/s]
2025年05月09日:04:27:41
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.5946|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7872|± |0.0041|

Those are just some of the tests. I basically tested all the combinations. Non-symmetrical was even worse. I know that I'm probably doing something very wrong but I have no idea what.
Any ideas are be appreciated! (I'm using the GPTQmodel 3.1.0.dev0 from the github from few days ago.)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I've made over 20 tests and GPTQv2 is always worse than BitsandBytes 4bit even the 8bit GTPQv2 is worse. #1608

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

TByte007
May 9, 2025

Replies: 0 comments

Select a reply

Uh oh!

I've made over 20 tests and GPTQv2 is always worse than BitsandBytes 4bit even the 8bit GTPQv2 is worse. #1608

Uh oh!

Uh oh!

TByte007 May 9, 2025

Replies: 0 comments

TByte007
May 9, 2025