RFC: Quantization Evaluation · pytorch/torchchat · Discussion #1490

This repository was archived by the owner on Sep 10, 2025. It is now read-only.

byjlw
Oct 24, 2024

🚀 The feature, motivation and pitch

With a single command, quantize the same model across every available quant scheme and configuration and output a table that compares the results. This will allow users to make better decisions about what quant scheme to use

Extend the eval command to quant and allow users to compare the performance and correctness of different quantization schemes.

Command would be something like this

torchchat.py eval llama3.1 --quantize linear dynamic --device mps --outputFormat table

In this instance the command would evaluate each available option dtype or configuration for each scheme specified.

And outputs a table that compares them all

Model: Llama 3.1 Instruct 8B
Device: M1 Max 64GB MPS
ExecutionMode: Eager

scheme	weights	activations	group size	embeddings	weight	group size	model size	t/s	peak memory	perplexity
none	bf16	-	-	no	-	-	16.2GB	17 t/s	N/A	WikiText2: 7.1
linear	4bit	-	256	yes	4bit	256	4.3GB	32 t/s	N/A	WikiText2: 8.1
linear	8bit	-	256	yes	8bit	256	8.3GB	27 t/s	N/A	WikiText2: 7.3

New flags for eval
--quantize (optional and default to none)
Description: provide a set of quantization schemes to use and the command will try to run quantization on every permutation based on the dtypes available. single scheme sample: {linear}, multi-scheme sample: {linear, dynamic, embedding}. If embedding is specified, there will be an m*n of every quant + embedding available.
All available options: linear, dynamic, embedding, embedding:wx

--outputFormat (optional and default to table)
Description: the format of the output. Either table or json

Design

All available options for a particular scheme, model, device and execution mode stored in a json object.
This will be the source of truth and the command can iterate through
dictionary to run eval on every possible permutation
This will be stored in quant_config/quant.json
Format of the object should be something like:

"devices": [
 {
 "name": "cpu",
 "model_types": [
 {
 "type": "textOnly",
 "execution_modes": [
 {
 "mode": "eager",
 "quantization_options": {
 "quant_schemes": [
 {
 "scheme": "linear",
 "weight_dtypes": [4, 8]
 },
 {
 "scheme": "dynamic",
 "weight_dtypes": [4, 8],
 "activation_dtypes": [4, 8]
 }
 ],
 "embedding_quant_schemes": [
 {
 "scheme": "linear",
 "weight_dtipes": [4, 8]
 }
 ],
 "weight_group_sizes": [256],
 "embedding_group_sizes": [256]
 }
 },
 {
 "mode": "compile",
 "quantization_options": {
 "quant_schemes": [
 {
 "scheme": "linear",
 "weight_dtypes": [4, 8]
 },
 {
 "scheme": "dynamic",
 "weight_dtypes": [4, 8],
 "activation_dtypes": [4, 8]
 }
 ],
 "embedding_quant_schemes": [
 {
 "scheme": "linear",
 "weight_dtipes": [4, 8]
 }
 ],
 "weight_group_sizes": [256],
 "embedding_group_sizes": [256]
 }
 }
 ]
 }

modeltype options: textOnly, llamaTextOnly, llamaVision, llava

The model definitions in mode_config/models.json will be extended to include a new property model_type

The eval.py needs to be extended so that evaluation can be run multiple times and generate the list of runs to make based on the flag values that came in.

When the model comes in we can look up the model type and device and then run all the configs in the set that match the schemes present in the --quantize flag
If --quantization is not present we do a single run using the specified params

Replies: 1 comment

byjlw
Oct 24, 2024
Author

Work has begun in this branch
https://github.com/pytorch/torchchat/tree/quant_eval

0 replies

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: Quantization Evaluation #1490

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

byjlw
Oct 24, 2024

🚀 The feature, motivation and pitch

Design

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

byjlw
Oct 24, 2024
Author

Select a reply

Uh oh!

RFC: Quantization Evaluation #1490

Uh oh!

Uh oh!

byjlw Oct 24, 2024

🚀 The feature, motivation and pitch

Design

Replies: 1 comment

Uh oh!

byjlw Oct 24, 2024 Author

byjlw
Oct 24, 2024

byjlw
Oct 24, 2024
Author