This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Enable torchao.experimental EmbeddingQuantization #1520

Open

Assignees

dillondesilva

Labels

Quantization triaged

@Jack-Khuu

Description

@Jack-Khuu

Jack-Khuu

opened

on Mar 31, 2025

🚀 The feature, motivation and pitch

Quantization is a technique used to reduce the speed, size, or memory requirements of a model and torchao is PyTorch's native quantization library for inference and training

There are new experimental quantizations in torchao that we would like to enable in torchchat. Specifically this task is for enabling EmbeddingQuantizer and SharedEmbeddingQuantizer.

Entrypoint:

torchchat/torchchat/utils/quantize.py

Line 101 in 1384f7d

def quantize_model(

Task: Using ExecuTorch as a reference (pytorch/executorch#9548) add support for EmbeddingQuantizer and SharedEmbeddingQuantizer.

cc: @metascroy, @manuelcandales

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

Metadata

Assignees

@dillondesilva
dillondesilva

Labels

Quantization triaged

Type

No type

Projects

[torchchat] Looking for Contributors

Status

Ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable torchao.experimental EmbeddingQuantization #1520

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

RFC (Optional)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions