Improve pos embed for Flux.1 inference on Ascend NPU #12534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

yiyixuxu merged 1 commit into huggingface:main from gameofdimension:npu-pos-embed

Oct 28, 2025

Merged

Improve pos embed for Flux.1 inference on Ascend NPU #12534

yiyixuxu merged 1 commit into huggingface:main from gameofdimension:npu-pos-embed

Oct 28, 2025

+6 −2

Conversation

@gameofdimension

Copy link

Contributor

@gameofdimension gameofdimension commented Oct 23, 2025

What does this PR do?

Moving pos_embed computation from NPU to CPU results in a 1.4x speedup in Flux.1's end-to-end latency.

model	device	e2e latency
FLUX.1-dev	npu	31s
FLUX.1-dev	cpu	21s
FLUX.1-Fill-dev	npu	64s
FLUX.1-Fill-dev	cpu	46s

Tested hardware:

Ascend 910B2C

Repro Code

FLUX.1-dev

import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("npu")
prompt = "A cat holding a sign that says hello world"
image = pipe(
 prompt,
 height=1024,
 width=1024,
 guidance_scale=3.5,
 num_inference_steps=50,
 max_sequence_length=512,
 generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")

FLUX.1-Fill-dev

import torch
from diffusers import FluxFillPipeline
from diffusers.utils import load_image
image = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup.png")
mask = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup_mask.png")
pipe = FluxFillPipeline.from_pretrained("black-forest-labs/FLUX.1-Fill-dev", torch_dtype=torch.bfloat16).to("npu")
image = pipe(
 prompt="a white paper cup",
 image=image,
 mask_image=mask,
 height=1632,
 width=1232,
 guidance_scale=30,
 num_inference_steps=50,
 max_sequence_length=512,
 generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save(f"flux-fill-dev.png")

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.


 improve pos embed for ascend npu

30c1be9

yiyixuxu

yiyixuxu approved these changes

Oct 28, 2025

View reviewed changes

Copy link

Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@yiyixuxu yiyixuxu merged commit 303efd2 into huggingface:main

Oct 28, 2025

10 of 11 checks passed

Labels

None yet

2 participants

@gameofdimension @yiyixuxu

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve pos embed for Flux.1 inference on Ascend NPU #12534

Improve pos embed for Flux.1 inference on Ascend NPU #12534

Conversation

@gameofdimension gameofdimension commented Oct 23, 2025

What does this PR do?

Moving pos_embed computation from NPU to CPU results in a 1.4x speedup in Flux.1's end-to-end latency.

Tested hardware:

Repro Code

FLUX.1-dev

FLUX.1-Fill-dev

Before submitting

Who can review?

Uh oh!

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants