-
-
Notifications
You must be signed in to change notification settings - Fork 792
Alternatives to BitsAndBytes for HF models #1337
-
Hello,
I'm loading and fine-tuning a model from HF to use it with Ollama afterwards and for now I relied on BitsAndBytes for quantization (resources limitations). However it turns out that even with the following config, the safetensors were finally exported as uint8 (U8) instead of float16 (F16)* thus impossible to use with Ollama which only supports F16, BF16 and F32:
bnb_config: BitsAndBytesConfig = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype="float16", bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage="float16" )
*I understood that bnb_4bit_quant_storage
is supposed to be the type in which weight are stored when saving the model, correct me if I'm wrong.
Thus, do you know any other library/framework to quantize a model while (or after) loading it from HF which works similar to BitsAndBytes but exports tensors in one of the valid dtypes above?
I looked around on the web but couldn't find anything fitting my needs.
Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 1 comment
-
I'd like to ask if you found what you were looking for, as I am in a similar situation.
Beta Was this translation helpful? Give feedback.