Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Bump gram-newton-schulz to 0.1.5 (cutlass-dsl 4.5.2) #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
JohnLangford merged 1 commit into main from jcl/dion-gram-ns-0.1.5
Jun 26, 2026
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions CHANGELOG.md
View file Open in desktop
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,16 @@ All notable changes to this project are documented in this file.
- **Breaking (install):** `gram-newton-schulz` and `quack-kernels` are no longer
base dependencies. They moved to an optional `dion[gram-newton-schulz]` extra
(alias `dion[gns]`), and are also excluded from the `dev` and `train` extras.
This keeps the default install free of the transitive `nvidia-cutlass-dsl==4.4.2`
pin, which conflicts with Flash-Attention-4 / Blackwell stacks built on cutlass
`4.5.2`.
This keeps the default install free of the heavy Gram Newton-Schulz GPU stack
(and its transitive `nvidia-cutlass-dsl` pin).

**Action required:** if you run with `use_gram_newton_schulz=True`, install the
extra (`pip install "dion[gns] @ git+https://github.com/microsoft/dion.git"`, or
`pip install -e ".[gns]"` from a clone). Without it, optimizer construction now
raises a clear `ImportError` at runtime instead of the kernels being silently
present. Opting in re-introduces the cutlass `4.4.2` pin, so use a separate
environment from FA4/Blackwell.
present.

- Bumped the optional `dion[gns]` extra to `gram-newton-schulz==0.1.5`
(`quack-kernels==0.5.0`). This moves its transitive `nvidia-cutlass-dsl` pin from
`4.4.2` to `4.5.2`, matching current Flash-Attention-4 / Blackwell stacks, so the
extra no longer conflicts with them.
4 changes: 2 additions & 2 deletions README.md
View file Open in desktop
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Our implementations are available as a `pip` package! Install to use in your pro
pip install git+https://github.com/microsoft/dion.git
```

> The optional Gram Newton-Schulz orthogonalization kernels (enabled with `use_gram_newton_schulz=True`) are not pulled in by the base install. Add them with `pip install "dion[gram-newton-schulz] @ git+https://github.com/microsoft/dion.git"`, or `pip install -e ".[gram-newton-schulz]"` from a clone. Note: this extra pins `nvidia-cutlass-dsl==4.4.2`, which conflicts with Flash-Attention-4 / Blackwell stacks built on cutlass `4.5.2`, so install it in a separate environment if you need both.
> The optional Gram Newton-Schulz orthogonalization kernels (enabled with `use_gram_newton_schulz=True`) are not pulled in by the base install. Add them with `pip install "dion[gram-newton-schulz] @ git+https://github.com/microsoft/dion.git"`, or `pip install -e ".[gram-newton-schulz]"` from a clone. Note: this extra pins `nvidia-cutlass-dsl==4.5.2`, matching the cutlass version used by current Flash-Attention-4 / Blackwell stacks (the earlier `4.4.2` conflict no longer applies).

Then in your code, you can use:

Expand All @@ -68,7 +68,7 @@ git clone https://github.com/microsoft/dion.git
cd dion
pip install -e .[train]
```
> `train` stays free of the Gram Newton-Schulz kernels (and their `nvidia-cutlass-dsl==4.4.2` pin) so the default training install works on Flash-Attention-4 / Blackwell stacks. To train with `--use_gram_newton_schulz`, use `pip install -e ".[train,gns]"` in a separate environment. Likewise, to develop or test the Gram Newton-Schulz path, install `pip install -e ".[dev,gns]"` — a plain `[dev]` install skips the GNS-specific test cases.
> `train` stays free of the Gram Newton-Schulz kernels, which remain an opt-in extra. To train with `--use_gram_newton_schulz`, use `pip install -e ".[train,gns]"`; the extra's `nvidia-cutlass-dsl==4.5.2` pin now matches Flash-Attention-4 / Blackwell stacks, so the two no longer conflict. Likewise, to develop or test the Gram Newton-Schulz path, install `pip install -e ".[dev,gns]"` — a plain `[dev]` install skips the GNS-specific test cases.

Download pretokenized FineWeb dataset:
```bash
Expand Down
6 changes: 3 additions & 3 deletions requirements_gns.txt
View file Open in desktop
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Optional Gram Newton-Schulz orthogonalization kernels (use_gram_newton_schulz=True).
# quack-kernels is pulled in transitively by gram-newton-schulz; the explicit pin here
# is for reproducibility and must track whatever quack version the pinned gram-newton-schulz
# requires (gram-newton-schulz==0.1.4 also transitively pins nvidia-cutlass-dsl==4.4.2).
gram-newton-schulz==0.1.4
quack-kernels==0.4.1
# requires (gram-newton-schulz==0.1.5 also transitively pins nvidia-cutlass-dsl==4.5.2).
gram-newton-schulz==0.1.5
quack-kernels==0.5.0

AltStyle によって変換されたページ (->オリジナル) /