Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: turboderp-org/exllamav3

0.0.43

14 Jun 16:57
@github-actions github-actions

Choose a tag to compare

  • Fix error when MTP drafting in TP mode
  • Faster quanization

Full Changelog: v0.0.42...v0.0.43

Assets 52

0.0.42

12 Jun 22:05
@github-actions github-actions

Choose a tag to compare

  • Fix MTP drafting when MTP model is not on the target model's output device

Full Changelog: v0.0.41...v0.0.42

Loading

0.0.41

12 Jun 18:13
@github-actions github-actions

Choose a tag to compare

  • Add MTP support for Qwen3.5/3.6

Full Changelog: v0.0.40...v0.0.41

Loading

0.0.40

06 Jun 18:10
@github-actions github-actions

Choose a tag to compare

  • Support Gemma4UnifiedForConditionalGeneration

Full Changelog: v0.0.39...v0.0.40

Loading

0.0.39

31 May 22:02
@github-actions github-actions

Choose a tag to compare

  • Add Step3p7ForConditionalGeneration

Full Changelog: v0.0.38...v0.0.39

Loading
blackcat1402 and firengate reacted with thumbs up emoji
2 people reacted

0.0.38

29 May 17:13
@github-actions github-actions

Choose a tag to compare

  • Support Lfm2MoeForCausalLM (LFM 2.5)
  • Fix regression in GDN inference when bsz > 1
  • Fix issue causing DFlash to break in TP mode when cudaMallocAsync backend was used
  • QoL improvements

Full Changelog: v0.0.37...v0.0.38

Loading
firengate reacted with thumbs up emoji firengate reacted with hooray emoji firengate reacted with heart emoji firengate reacted with rocket emoji
1 person reacted

0.0.37

24 May 17:11
@github-actions github-actions

Choose a tag to compare

  • Another small bugfix

Full Changelog: v0.0.36...v0.0.37

Loading

0.0.36

24 May 09:36
@github-actions github-actions

Choose a tag to compare

  • Fix small SD regression

Full Changelog: v0.0.35...v0.0.36

Loading

0.0.35

23 May 23:54
@github-actions github-actions
c0b20f6
This commit was created on GitHub.com and signed with GitHub’s verified signature.
GPG key ID: B5690EEEBB952194
Verified
Learn about vigilant mode.

Choose a tag to compare

  • Tensor parallel mode for Qwen3.5/3.6
  • New recurrent state manager avoid dynamics allocation of recurrent states (reduces fragmentation and keeps VRAM overhead constant for Qwen3.5 etc.)
  • Improved checkpointing decision to make better use of available cache space
  • Perform reconstruct-GEMM in slices for large layers, greatly reducing VRAM overhead
  • Fix race condition causing streaming token output to lag behind generation in some situations
  • Support new tensor keys in Mistral 3.5 Medium
  • KL-div and perplexity kernels for eval scripts
  • More tuning
  • Lots of bugfixes

Full Changelog: v0.0.34...v0.0.35

Loading
firengate reacted with thumbs up emoji firengate reacted with hooray emoji firengate reacted with heart emoji remichu-ai, Pento95, adamo1139, and firengate reacted with rocket emoji firengate reacted with eyes emoji
4 people reacted

0.0.34

09 May 19:10
@github-actions github-actions

Choose a tag to compare

  • Fix regression causing extra VRAM usage during prefill
  • Add CUDA 13.2.0 wheels (built with cu132 against torch==2.11.0+cu130)

Full Changelog: v0.0.33...v0.0.34

Loading
RigRaph and firengate reacted with thumbs up emoji firengate reacted with hooray emoji firengate reacted with heart emoji firengate and vonamakitsu reacted with rocket emoji
3 people reacted
Previous 1 3 4 5
Previous

AltStyle によって変換されたページ (->オリジナル) /