diff --git a/CHANGELOG.md b/CHANGELOG.md
index 0950944..acbe477 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,13 +9,16 @@ All notable changes to this project are documented in this file.
 - **Breaking (install):** `gram-newton-schulz` and `quack-kernels` are no longer
 base dependencies. They moved to an optional `dion[gram-newton-schulz]` extra
 (alias `dion[gns]`), and are also excluded from the `dev` and `train` extras.
- This keeps the default install free of the transitive `nvidia-cutlass-dsl==4.4.2`
- pin, which conflicts with Flash-Attention-4 / Blackwell stacks built on cutlass
- `4.5.2`.
+ This keeps the default install free of the heavy Gram Newton-Schulz GPU stack
+ (and its transitive `nvidia-cutlass-dsl` pin).
 
 **Action required:** if you run with `use_gram_newton_schulz=True`, install the
 extra (`pip install "dion[gns] @ git+https://github.com/microsoft/dion.git"`, or
 `pip install -e ".[gns]"` from a clone). Without it, optimizer construction now
 raises a clear `ImportError` at runtime instead of the kernels being silently
- present. Opting in re-introduces the cutlass `4.4.2` pin, so use a separate
- environment from FA4/Blackwell.
+ present.
+
+- Bumped the optional `dion[gns]` extra to `gram-newton-schulz==0.1.5`
+ (`quack-kernels==0.5.0`). This moves its transitive `nvidia-cutlass-dsl` pin from
+ `4.4.2` to `4.5.2`, matching current Flash-Attention-4 / Blackwell stacks, so the
+ extra no longer conflicts with them.
diff --git a/README.md b/README.md
index 3f93415..723500f 100644
--- a/README.md
+++ b/README.md
@@ -50,7 +50,7 @@ Our implementations are available as a `pip` package! Install to use in your pro
 pip install git+https://github.com/microsoft/dion.git
 ```
 
-> The optional Gram Newton-Schulz orthogonalization kernels (enabled with `use_gram_newton_schulz=True`) are not pulled in by the base install. Add them with `pip install "dion[gram-newton-schulz] @ git+https://github.com/microsoft/dion.git"`, or `pip install -e ".[gram-newton-schulz]"` from a clone. Note: this extra pins `nvidia-cutlass-dsl==4.4.2`, which conflicts with Flash-Attention-4 / Blackwell stacks built on cutlass `4.5.2`, so install it in a separate environment if you need both.
+> The optional Gram Newton-Schulz orthogonalization kernels (enabled with `use_gram_newton_schulz=True`) are not pulled in by the base install. Add them with `pip install "dion[gram-newton-schulz] @ git+https://github.com/microsoft/dion.git"`, or `pip install -e ".[gram-newton-schulz]"` from a clone. Note: this extra pins `nvidia-cutlass-dsl==4.5.2`, matching the cutlass version used by current Flash-Attention-4 / Blackwell stacks (the earlier `4.4.2` conflict no longer applies).
 
 Then in your code, you can use:
 
@@ -68,7 +68,7 @@ git clone https://github.com/microsoft/dion.git
 cd dion
 pip install -e .[train]
 ```
-> `train` stays free of the Gram Newton-Schulz kernels (and their `nvidia-cutlass-dsl==4.4.2` pin) so the default training install works on Flash-Attention-4 / Blackwell stacks. To train with `--use_gram_newton_schulz`, use `pip install -e ".[train,gns]"` in a separate environment. Likewise, to develop or test the Gram Newton-Schulz path, install `pip install -e ".[dev,gns]"` — a plain `[dev]` install skips the GNS-specific test cases.
+> `train` stays free of the Gram Newton-Schulz kernels, which remain an opt-in extra. To train with `--use_gram_newton_schulz`, use `pip install -e ".[train,gns]"`; the extra's `nvidia-cutlass-dsl==4.5.2` pin now matches Flash-Attention-4 / Blackwell stacks, so the two no longer conflict. Likewise, to develop or test the Gram Newton-Schulz path, install `pip install -e ".[dev,gns]"` — a plain `[dev]` install skips the GNS-specific test cases.
 
 Download pretokenized FineWeb dataset:
 ```bash
diff --git a/requirements_gns.txt b/requirements_gns.txt
index de046e2..02bf1ec 100644
--- a/requirements_gns.txt
+++ b/requirements_gns.txt
@@ -1,6 +1,6 @@
 # Optional Gram Newton-Schulz orthogonalization kernels (use_gram_newton_schulz=True).
 # quack-kernels is pulled in transitively by gram-newton-schulz; the explicit pin here
 # is for reproducibility and must track whatever quack version the pinned gram-newton-schulz
-# requires (gram-newton-schulz==0.1.4 also transitively pins nvidia-cutlass-dsl==4.4.2).
-gram-newton-schulz==0.1.4
-quack-kernels==0.4.1
+# requires (gram-newton-schulz==0.1.5 also transitively pins nvidia-cutlass-dsl==4.5.2).
+gram-newton-schulz==0.1.5
+quack-kernels==0.5.0
</div><div class="naked_ctrl">
<form action="/index.cgi/contrast" method="get" name="gate">
<p><a href="http://altstyle.alfasado.net">AltStyle</a> によって変換されたページ <a href="https://patch-diff.githubusercontent.com/raw/microsoft/dion/pull/95.diff">(-&gt;オリジナル)</a>
/ <label>アドレス: <input type="text" name="naked_post_url" value="https://patch-diff.githubusercontent.com/raw/microsoft/dion/pull/95.diff" size="22" /></label> <label>モード: <select name="naked_post_mode">
<option value="default">デフォルト</option>
<option value="speech">音声ブラウザ</option>
<option value="ruby">ルビ付き</option>
<option value="contrast" selected="selected">配色反転</option>
<option value="larger-text">文字拡大</option>
<option value="mobile">モバイル</option>
</select>
<input type="submit" value="表示" />
</p>
</form>
</div>