Releases: turboderp-org/exllamav3

0.0.43

14 Jun 16:57

@github-actions github-actions

v0.0.43

c5d9c65

0.0.43 Latest

Latest

Fix error when MTP drafting in TP mode
Faster quanization

Full Changelog: v0.0.42...v0.0.43

Assets 52

exllamav3-0.0.43+cu128.torch2.10.0-cp310-cp310-linux_x86_64.whl

sha256:f0964a8debe78d47538b4ae37ca5e5162f5cb09b48ad0531ffdc4400e8c12981

185 MB 2026年06月14日T17:32:18Z
exllamav3-0.0.43+cu128.torch2.10.0-cp310-cp310-win_amd64.whl

sha256:312062a54b7aa997fb07d588c7e695d1b46c4f43d0c7522757c13e4c38fd8d0d

169 MB 2026年06月14日T17:39:48Z
exllamav3-0.0.43+cu128.torch2.10.0-cp311-cp311-linux_x86_64.whl

sha256:417de48608be60629938fdd2cc5981094b744f0cfa309453a37d8d939cf8e6c0

185 MB 2026年06月14日T17:31:54Z
exllamav3-0.0.43+cu128.torch2.10.0-cp311-cp311-win_amd64.whl

sha256:2305f66bfa614f67dc3d9efa5f4f1237abf6eda179dc69920d76dd6a5026eb58

169 MB 2026年06月14日T17:37:45Z
exllamav3-0.0.43+cu128.torch2.10.0-cp312-cp312-linux_x86_64.whl

sha256:2b9ac05af7f7c61738651ae0d4007e565fe519e6e3ba2c7e293aacefd7684b42

185 MB 2026年06月14日T17:33:44Z
exllamav3-0.0.43+cu128.torch2.10.0-cp312-cp312-win_amd64.whl

sha256:b46da4707268bbfe49ac764e636fa985f9de9207316c9f19f2052c83569dc60e

169 MB 2026年06月14日T17:38:35Z
exllamav3-0.0.43+cu128.torch2.10.0-cp313-cp313-linux_x86_64.whl

sha256:0ce569a472c18ce07a2e2ed783b9c73d0ae592ad416d048f93d1bfb220c845d4

185 MB 2026年06月14日T17:27:28Z
exllamav3-0.0.43+cu128.torch2.10.0-cp313-cp313-win_amd64.whl

sha256:0807eeb21f614074e6c55a792f881391d45d0604a2445ad6828fa362e2037145

169 MB 2026年06月14日T17:37:15Z
exllamav3-0.0.43+cu128.torch2.10.0-cp314-cp314-linux_x86_64.whl

sha256:31694d44a3ec0d104c5d1710945521c4fa9bc41b8b6b9776a8a50ec98c6f6ad6

185 MB 2026年06月14日T17:30:59Z
exllamav3-0.0.43+cu128.torch2.10.0-cp314-cp314-win_amd64.whl

sha256:e368d7a07a532837505db8579aeb034724b8fac5abddc30165bfd50cb21eba06

171 MB 2026年06月14日T17:38:52Z
Source code (zip)

2026年06月14日T16:56:03Z
Source code (tar.gz)

2026年06月14日T16:56:03Z

0.0.42

12 Jun 22:05

@github-actions github-actions

v0.0.42

595d6c4

0.0.42

Fix MTP drafting when MTP model is not on the target model's output device

Full Changelog: v0.0.41...v0.0.42

Assets 52

0.0.41

12 Jun 18:13

@github-actions github-actions

v0.0.41

30e1800

0.0.41

Add MTP support for Qwen3.5/3.6

Full Changelog: v0.0.40...v0.0.41

Assets 52

0.0.40

06 Jun 18:10

@github-actions github-actions

v0.0.40

40ac7ba

0.0.40

Support Gemma4UnifiedForConditionalGeneration

Full Changelog: v0.0.39...v0.0.40

Assets 52

0.0.39

31 May 22:02

@github-actions github-actions

v0.0.39

3a2a94f

0.0.39

Add Step3p7ForConditionalGeneration

Full Changelog: v0.0.38...v0.0.39

Assets 52

2 people reacted

0.0.38

29 May 17:13

@github-actions github-actions

v0.0.38

c18e9b4

0.0.38

Support Lfm2MoeForCausalLM (LFM 2.5)
Fix regression in GDN inference when bsz > 1
Fix issue causing DFlash to break in TP mode when cudaMallocAsync backend was used
QoL improvements

Full Changelog: v0.0.37...v0.0.38

Assets 52

1 person reacted

0.0.37

24 May 17:11

@github-actions github-actions

v0.0.37

1d71227

0.0.37

Another small bugfix

Full Changelog: v0.0.36...v0.0.37

Assets 52

0.0.36

24 May 09:36

@github-actions github-actions

v0.0.36

18ea6d2

0.0.36

Fix small SD regression

Full Changelog: v0.0.35...v0.0.36

Assets 52

0.0.35

23 May 23:54

@github-actions github-actions

v0.0.35

c0b20f6

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

0.0.35

Tensor parallel mode for Qwen3.5/3.6
New recurrent state manager avoid dynamics allocation of recurrent states (reduces fragmentation and keeps VRAM overhead constant for Qwen3.5 etc.)
Improved checkpointing decision to make better use of available cache space
Perform reconstruct-GEMM in slices for large layers, greatly reducing VRAM overhead
Fix race condition causing streaming token output to lag behind generation in some situations
Support new tensor keys in Mistral 3.5 Medium
KL-div and perplexity kernels for eval scripts
More tuning
Lots of bugfixes

Full Changelog: v0.0.34...v0.0.35

Assets 52

4 people reacted

0.0.34

09 May 19:10

@github-actions github-actions

v0.0.34

fa14279

0.0.34

Fix regression causing extra VRAM usage during prefill
Add CUDA 13.2.0 wheels (built with cu132 against torch==2.11.0+cu130)

Full Changelog: v0.0.33...v0.0.34

Assets 52

3 people reacted

Uh oh!

Releases: turboderp-org/exllamav3

0.0.43

Uh oh!

0.0.42

Uh oh!

0.0.41

Uh oh!

0.0.40

Uh oh!

0.0.39

Uh oh!

0.0.38

Uh oh!

0.0.37

Uh oh!

0.0.36

Uh oh!

0.0.35

Uh oh!

0.0.34

Uh oh!