[core] support flash attention through `kernels` #12387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

sayakpaul wants to merge 4 commits into main

from fa-hub

Open

[core] support flash attention through `kernels` #12387

sayakpaul wants to merge 4 commits into main from fa-hub

Conversation

sayakpaul

Copy link

Member

@sayakpaul sayakpaul commented Sep 25, 2025

What does this PR do?

Follow-up of #12236.

Testing code:

import torch
from diffusers import FluxPipeline
model_id = "black-forest-labs/FLUX.1-dev"
pipe = FluxPipeline.from_pretrained(
 model_id, torch_dtype=torch.bfloat16
).to("cuda")
pipe.transformer.set_attention_backend("flash_hub")
pipe.transformer.compile(fullgraph=True)
prompt = "A cat holding a sign that says 'hello world'"
with torch._dynamo.config.patch(error_on_recompile=True):
 image = pipe(
 prompt, num_inference_steps=28, guidance_scale=4.0, generator=torch.manual_seed(0)
 ).images[0]
 image.save("output.png")

Tip

Works with torch.compile fullgraph compatibility.

I have tested the code on H100 and A100, and it works.

sayakpaul added 2 commits

September 25, 2025 13:05

@sayakpaul

up

c386f22

@sayakpaul


 support fa (2) through kernels.

d252c02

@sayakpaul sayakpaul requested a review from DN6

September 25, 2025 08:01

@sayakpaul sayakpaul added the performance label

Sep 25, 2025

@sayakpaul sayakpaul mentioned this pull request

Sep 25, 2025

use kernels to support _flash_hub attention backend #12318

Closed

6 tasks

sayakpaul

sayakpaul commented

Sep 25, 2025

View reviewed changes

src/diffusers/models/attention_dispatch.py

# `flash-attn`

FLASH = "flash"

FLASH_VARLEN = "flash_varlen"

FLASH_HUB = "flash_hub"

Copy link

Member Author

@sayakpaul sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flash Attention is stable. So, we don't have to mark it private like FA3.

@HuggingFaceDocBuilderDev

Copy link

HuggingFaceDocBuilderDev commented Sep 25, 2025

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

MekkCyber

MekkCyber reviewed

Sep 25, 2025

View reviewed changes

Copy link

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool integration 🔥 ! I just left some nits

src/diffusers/models/attention_dispatch.py

Comment on lines +85 to +88

fa3_interface_hub = _get_fa3_from_hub()

flash_attn_3_func_hub = fa3_interface_hub.flash_attn_func

fa_interface_hub = _get_fa_from_hub()

flash_attn_func_hub = fa_interface_hub.flash_attn_func

Copy link

@MekkCyber MekkCyber Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we fetching both kernels here ?

Copy link

Member Author

@sayakpaul sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the way APIs for attention backends are designed and also to support torch.compile with fullgraph traceability (when possible).

We will let it grow a bit and upon feedback, we can revisit how to better deal with this.

src/diffusers/models/attention_dispatch.py

FLASH = "flash"

FLASH_VARLEN = "flash_varlen"

FLASH_HUB = "flash_hub"

# FLASH_VARLEN_HUB = "flash_varlen_hub" # not supported yet.

Copy link

@MekkCyber MekkCyber Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this related to the kernel or it just needs more time to be integrated ?

Copy link

Member Author

@sayakpaul sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have models that use varlen.

Copy link

Contributor

@bghira bghira Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sayakpaul qwen image uses varlen. also, native fused qkv+mlp attn requires varlen function.

@sayakpaul sayakpaul mentioned this pull request

Sep 25, 2025

[tests] Test attention backends #12388

Open

sayakpaul added 2 commits

September 26, 2025 11:10

@sayakpaul

up

1b96ed7

@sayakpaul


 Merge branch 'main' into fa-hub

474b995

Labels

performance

4 participants

@sayakpaul @HuggingFaceDocBuilderDev @bghira @MekkCyber

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[core] support flash attention through `kernels` #12387

Are you sure you want to change the base?

[core] support flash attention through `kernels` #12387

Uh oh!

Conversation

@sayakpaul sayakpaul commented Sep 25, 2025

What does this PR do?

Uh oh!

@sayakpaul sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Sep 25, 2025

Uh oh!

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

@MekkCyber MekkCyber Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

@sayakpaul sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

@MekkCyber MekkCyber Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

@sayakpaul sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

@bghira bghira Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[core] support flash attention through kernels #12387

Are you sure you want to change the base?

[core] support flash attention through kernels #12387

Uh oh!

Conversation

@sayakpaul sayakpaul commented Sep 25, 2025

What does this PR do?

Uh oh!

@sayakpaul sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Sep 25, 2025

Uh oh!

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

@MekkCyber MekkCyber Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

@sayakpaul sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

@MekkCyber MekkCyber Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

@sayakpaul sayakpaul Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

@bghira bghira Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[core] support flash attention through `kernels` #12387

[core] support flash attention through `kernels` #12387