gpt-oss is not working with flash-attention #42736

pushkar-hue · 2025-12-09T10:38:38Z

When initializing gpt-oss model with attn_implementation="flash_attention_2" or "flash_attention_3" would result in silent failures and garbage generation output as reported in #42533.

gpt-oss models rely on attention sinks which are not yet implemented for the flash_attention as suggested the safest path is to strictly block unsupported attention backends rather than failing silently or assuming a fallback.

@vasqu can you see if this is what you had in mind?

…formers into flash-attention-gptoss

github-actions · 2025-12-09T10:39:38Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gpt_oss

vasqu

Added some comments, I have some concerns on the check + somehow there are a lot of changes in the modeling that shouldn't be there (e.g. __ changes to _)

vasqu · 2025-12-09T12:41:25Z

src/transformers/models/gpt_oss/modular_gpt_oss.py

+    def __init__(self, config: GptOssConfig):
+        super().__init__(config)
+
+        if config._attn_implementation in ["flash_attention_2", "flash_attention_3"]:


2 things that might not work as expected

This only checks during init, but we can also update afterwards with e.g. set_attn_implementation. We probably cannot avoid checking during runtime (e,g, forward).

There are more FA implementations especially with the kernels ones. Afaik, we only have the vllm kernel working so we should check along if "flash" in config._attn_implementation and config._attn_implementation != "vllm-kernel": raise ValueError(...) (don't remember the name of the kernel).

vasqu · 2025-12-09T12:42:18Z

src/transformers/models/gpt_oss/modeling_gpt_oss.py

This is weird, there is a lot of changes like __init__ -> _init_ which shouldn't happen. Maybe some ruff version mismatch? It's weird

pushkar-hue added 4 commits December 8, 2025 22:06

gpt-oss not working with flash attention

6392026

gpt-oss not working with flash attention

b9e79d1

modeling file generated

c30ced2

Merge branch 'flash-attention-gptoss' of github.com:pushkar-hue/trans…

b1d13eb

…formers into flash-attention-gptoss

vasqu reviewed Dec 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gpt-oss is not working with flash-attention #42736

gpt-oss is not working with flash-attention #42736

Uh oh!

pushkar-hue commented Dec 9, 2025

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

vasqu left a comment

Uh oh!

vasqu Dec 9, 2025

Uh oh!

vasqu Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gpt-oss is not working with flash-attention #42736

Are you sure you want to change the base?

gpt-oss is not working with flash-attention #42736

Uh oh!

Conversation

pushkar-hue commented Dec 9, 2025

Uh oh!

github-actions bot commented Dec 9, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants