Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
This repository was archived by the owner on Sep 10, 2025. It is now read-only.
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Enable torchao.experimental EmbeddingQuantization #1520

Open
Assignees
Labels
QuantizationIssues related to Quantization or torchao triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
@Jack-Khuu

Description

🚀 The feature, motivation and pitch

Quantization is a technique used to reduce the speed, size, or memory requirements of a model and torchao is PyTorch's native quantization library for inference and training

There are new experimental quantizations in torchao that we would like to enable in torchchat. Specifically this task is for enabling EmbeddingQuantizer and SharedEmbeddingQuantizer.

Entrypoint:

def quantize_model(

Task: Using ExecuTorch as a reference (pytorch/executorch#9548) add support for EmbeddingQuantizer and SharedEmbeddingQuantizer.

cc: @metascroy, @manuelcandales

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

Metadata

Metadata

Labels

QuantizationIssues related to Quantization or torchao triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

Status

Ready

Milestone

No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      AltStyle によって変換されたページ (->オリジナル) /