Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Improve pos embed for Flux.1 inference on Ascend NPU #12534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
yiyixuxu merged 1 commit into huggingface:main from gameofdimension:npu-pos-embed
Oct 28, 2025

Conversation

@gameofdimension
Copy link
Contributor

@gameofdimension gameofdimension commented Oct 23, 2025

What does this PR do?

Moving pos_embed computation from NPU to CPU results in a 1.4x speedup in Flux.1's end-to-end latency.

model device e2e latency
FLUX.1-dev npu 31s
FLUX.1-dev cpu 21s
FLUX.1-Fill-dev npu 64s
FLUX.1-Fill-dev cpu 46s

Tested hardware:

Ascend 910B2C

Repro Code

FLUX.1-dev

import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("npu")
prompt = "A cat holding a sign that says hello world"
image = pipe(
 prompt,
 height=1024,
 width=1024,
 guidance_scale=3.5,
 num_inference_steps=50,
 max_sequence_length=512,
 generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")

FLUX.1-Fill-dev

import torch
from diffusers import FluxFillPipeline
from diffusers.utils import load_image
image = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup.png")
mask = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup_mask.png")
pipe = FluxFillPipeline.from_pretrained("black-forest-labs/FLUX.1-Fill-dev", torch_dtype=torch.bfloat16).to("npu")
image = pipe(
 prompt="a white paper cup",
 image=image,
 mask_image=mask,
 height=1632,
 width=1232,
 guidance_scale=30,
 num_inference_steps=50,
 max_sequence_length=512,
 generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"flux-fill-dev.png")

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@yiyixuxu yiyixuxu merged commit 303efd2 into huggingface:main Oct 28, 2025
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@yiyixuxu yiyixuxu yiyixuxu approved these changes

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /