Alternatives to BitsAndBytes for HF models · bitsandbytes-foundation/bitsandbytes · Discussion #1337

Timelessprod
Aug 28, 2024

Hello,

I'm loading and fine-tuning a model from HF to use it with Ollama afterwards and for now I relied on BitsAndBytes for quantization (resources limitations). However it turns out that even with the following config, the safetensors were finally exported as uint8 (U8) instead of float16 (F16)* thus impossible to use with Ollama which only supports F16, BF16 and F32:

bnb_config: BitsAndBytesConfig = BitsAndBytesConfig(
 load_in_4bit=True,
 bnb_4bit_quant_type="nf4",
 bnb_4bit_compute_dtype="float16",
 bnb_4bit_use_double_quant=True,
 bnb_4bit_quant_storage="float16"
)

*I understood that bnb_4bit_quant_storage is supposed to be the type in which weight are stored when saving the model, correct me if I'm wrong.

Thus, do you know any other library/framework to quantize a model while (or after) loading it from HF which works similar to BitsAndBytes but exports tensors in one of the valid dtypes above?

I looked around on the web but couldn't find anything fitting my needs.

Thank you very much.

Replies: 1 comment

chibuzordev
Mar 25, 2025

I'd like to ask if you found what you were looking for, as I am in a similar situation.

0 replies

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Alternatives to BitsAndBytes for HF models #1337

Uh oh!

{{title}}

Uh oh!

Timelessprod
Aug 28, 2024

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

chibuzordev
Mar 25, 2025

Select a reply

Uh oh!

Uh oh!

Alternatives to BitsAndBytes for HF models #1337

Uh oh!

Timelessprod Aug 28, 2024

Replies: 1 comment

Uh oh!

chibuzordev Mar 25, 2025

Timelessprod
Aug 28, 2024

chibuzordev
Mar 25, 2025