Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

I've made over 20 tests and GPTQv2 is always worse than BitsandBytes 4bit even the 8bit GTPQv2 is worse. #1608

Unanswered
TByte007 asked this question in Q&A
Discussion options

I comparing GPTQv2 vs BnB using vLLM. For BnB I just use the on the fly quantization. I tried all kinds of combination of options for GPTQv2 and it always comes out worse at accuracy. Even the 8bit is worse then 4bit BnB.
I tested it using "anli" and "hellaswag" on "Nitral-AI/Violet_Magcap-12B" model.
This is part of the results:
ANLI:

GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 128, damp_percent: 0.1, mse: false
9600/9600 [07:25<00:00, 21.55it/s]
2025年05月09日:04:47:54
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1| 1|none | 0|acc |↑ |0.4830|± |0.0158|
|anli_r2| 1|none | 0|acc |↑ |0.4710|± |0.0158|
|anli_r3| 1|none | 0|acc |↑ |0.4583|± |0.0144|
GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1, mse: true
9600/9600 [08:39<00:00, 18.47it/s]
2025年05月09日:05:10:15
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1| 1|none | 0|acc |↑ |0.5160|± |0.0158|
|anli_r2| 1|none | 0|acc |↑ |0.4800|± |0.0158|
|anli_r3| 1|none | 0|acc |↑ |0.4858|± |0.0144|
Bitsandbytes 4bit
9600/9600 [07:27<00:00, 21.45it/s]
2025年05月09日:04:58:01
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1| 1|none | 0|acc |↑ |0.5220|± |0.0158|
|anli_r2| 1|none | 0|acc |↑ |0.5050|± |0.0158|
|anli_r3| 1|none | 0|acc |↑ |0.5017|± |0.0144|

Hellswag:

BitsAndBytes 4bit
40168/40168 [31:39<00:00, 21.15it/s]
2025年05月02日:05:24:22
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=20,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6262|± |0.0048|
| | |none | 0|acc_norm|↑ |0.8091|± |0.0039|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText
40168/40168 [27:57<00:00, 23.94it/s]
2025年05月08日:14:41:55
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6061|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7971|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64
40168/40168 [28:17<00:00, 23.66it/s]
2025年05月08日:19:10:18
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6067|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7977|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: -1
40168/40168 [21:51<00:00, 30.64it/s]
2025年05月08日:20:30:24
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6072|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7973|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1
40168/40168 [28:09<00:00, 23.78it/s]
2025年05月08日:22:01:39
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6061|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7976|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1, mse: true
40168/40168 [28:15<00:00, 23.69it/s]
2025年05月09日:02:34:33
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6062|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7976|± |0.0040|
GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 128, damp_percent: 0.1, mse: false
40168/40168 [24:10<00:00, 27.69it/s]
2025年05月09日:04:27:41
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.5946|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7872|± |0.0041|

Those are just some of the tests. I basically tested all the combinations. Non-symmetrical was even worse. I know that I'm probably doing something very wrong but I have no idea what.
Any ideas are be appreciated! (I'm using the GPTQmodel 3.1.0.dev0 from the github from few days ago.)

You must be logged in to vote

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant

AltStyle によって変換されたページ (->オリジナル) /