-
Notifications
You must be signed in to change notification settings - Fork 187
-
I comparing GPTQv2 vs BnB using vLLM. For BnB I just use the on the fly quantization. I tried all kinds of combination of options for GPTQv2 and it always comes out worse at accuracy. Even the 8bit is worse then 4bit BnB.
I tested it using "anli" and "hellaswag" on "Nitral-AI/Violet_Magcap-12B" model.
This is part of the results:
ANLI:
GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 128, damp_percent: 0.1, mse: false
9600/9600 [07:25<00:00, 21.55it/s]
2025年05月09日:04:47:54
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1| 1|none | 0|acc |↑ |0.4830|± |0.0158|
|anli_r2| 1|none | 0|acc |↑ |0.4710|± |0.0158|
|anli_r3| 1|none | 0|acc |↑ |0.4583|± |0.0144|
GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1, mse: true
9600/9600 [08:39<00:00, 18.47it/s]
2025年05月09日:05:10:15
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1| 1|none | 0|acc |↑ |0.5160|± |0.0158|
|anli_r2| 1|none | 0|acc |↑ |0.4800|± |0.0158|
|anli_r3| 1|none | 0|acc |↑ |0.4858|± |0.0144|
Bitsandbytes 4bit
9600/9600 [07:27<00:00, 21.45it/s]
2025年05月09日:04:58:01
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=7200), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot|Metric| |Value | |Stderr|
|-------|------:|------|-----:|------|---|-----:|---|-----:|
|anli_r1| 1|none | 0|acc |↑ |0.5220|± |0.0158|
|anli_r2| 1|none | 0|acc |↑ |0.5050|± |0.0158|
|anli_r3| 1|none | 0|acc |↑ |0.5017|± |0.0144|
Hellswag:
BitsAndBytes 4bit
40168/40168 [31:39<00:00, 21.15it/s]
2025年05月02日:05:24:22
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=20,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6262|± |0.0048|
| | |none | 0|acc_norm|↑ |0.8091|± |0.0039|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText
40168/40168 [27:57<00:00, 23.94it/s]
2025年05月08日:14:41:55
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6061|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7971|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64
40168/40168 [28:17<00:00, 23.66it/s]
2025年05月08日:19:10:18
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6067|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7977|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: -1
40168/40168 [21:51<00:00, 30.64it/s]
2025年05月08日:20:30:24
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6072|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7973|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1
40168/40168 [28:09<00:00, 23.78it/s]
2025年05月08日:22:01:39
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6061|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7976|± |0.0040|
GPTQv2 8bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 64, damp_percent: 0.1, mse: true
40168/40168 [28:15<00:00, 23.69it/s]
2025年05月09日:02:34:33
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.6062|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7976|± |0.0040|
GPTQv2 4bit from gptq-quant-ChatGPT-4.1.py WikiText, desc_act: true, gs: 128, damp_percent: 0.1, mse: false
40168/40168 [24:10<00:00, 27.69it/s]
2025年05月09日:04:27:41
local-completions (base_url=http://10.0.0.6:8000/v1/completions,tokenizer=Nitral-AI/Violet_Magcap-12B,num_concurrent=40,timeout=3600), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 1
| Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
|---------|------:|------|-----:|--------|---|-----:|---|-----:|
|hellaswag| 1|none | 0|acc |↑ |0.5946|± |0.0049|
| | |none | 0|acc_norm|↑ |0.7872|± |0.0041|
Those are just some of the tests. I basically tested all the combinations. Non-symmetrical was even worse. I know that I'm probably doing something very wrong but I have no idea what.
Any ideas are be appreciated! (I'm using the GPTQmodel 3.1.0.dev0 from the github from few days ago.)
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment