Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Alternatives to BitsAndBytes for HF models #1337

Unanswered
Timelessprod asked this question in Q&A
Discussion options

Hello,

I'm loading and fine-tuning a model from HF to use it with Ollama afterwards and for now I relied on BitsAndBytes for quantization (resources limitations). However it turns out that even with the following config, the safetensors were finally exported as uint8 (U8) instead of float16 (F16)* thus impossible to use with Ollama which only supports F16, BF16 and F32:

bnb_config: BitsAndBytesConfig = BitsAndBytesConfig(
 load_in_4bit=True,
 bnb_4bit_quant_type="nf4",
 bnb_4bit_compute_dtype="float16",
 bnb_4bit_use_double_quant=True,
 bnb_4bit_quant_storage="float16"
)

*I understood that bnb_4bit_quant_storage is supposed to be the type in which weight are stored when saving the model, correct me if I'm wrong.

Thus, do you know any other library/framework to quantize a model while (or after) loading it from HF which works similar to BitsAndBytes but exports tensors in one of the valid dtypes above?

I looked around on the web but couldn't find anything fitting my needs.

Thank you very much.

You must be logged in to vote

Replies: 1 comment

Comment options

I'd like to ask if you found what you were looking for, as I am in a similar situation.

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /