Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Releases: LostRuins/koboldcpp

koboldcpp-1.106.2

17 Jan 04:33
@LostRuins LostRuins

Choose a tag to compare

koboldcpp-1.106.2

MCP for the masses edition

image
  • NEW: MCP Server and Client Support Added to KoboldCpp - KoboldCpp now supports running an MCP bridge that serves as a direct drop-in replacement for Claude Desktop.
    • KoboldCpp can connect to any HTTP or STDIO MCP server, using a mcp.json config format compatible with Claude Desktop.
    • Multiple servers are supported, KoboldCpp will automatically combine their tools and dispatch request appropriately.
    • Recommended guide for MCP newbies: Here is a simple guide on running a Filesystem MCP Server to let your AI browse files locally on your PC and search the web - https://github.com/LostRuins/koboldcpp/wiki#mcp-tool-calling
    • CAUTION: Running ANY MCP SERVER gives it full access to your system. Their 3rd party scripts will be able to modify and make changes to your files. Be sure to only run servers you trust!
    • The example music playing MCP server used in the screenshot above was this audio-player-mcp
  • Flash Attention is now enabled by default when using the GUI launcher.
  • Improvements to tool parsing (thanks @AdamJ8)
  • API field continue_assistant_turn is now enabled by default in all chat completions (assistant prefill)
  • Interrogate image max length increased
  • Various StableUI fixes by @Riztard
  • Using the environment variable GGML_VK_VISIBLE_DEVICES externally now always overrides whatever vulkan device settings set from KoboldCpp.
  • Updated Kobold Lite, multiple fixes and improvements
    • NEW: Full settings UI overhaul from @Rose22, the settings menu is now much cleaner and more organized. Feedback welcome!
    • NEW: Added 4 new OLED themes from @Rose22
    • Improved performance when editing massive texts
    • General cleanup and multiple minor adjustments
    • Browser MCP implementation adapted from @ycros simple-mcp-client
  • Merged fixes, model support, and improvements from upstream

Hotfix 1.106.1 - Allow overriding selected GPU devices directly with --device e.g. --device Vulkan0,Vulkan1, Updated lite
Hotfix 1.106.2 - Increase logprobs from 5 to 10, fixed memory usage with embeddings, allow device override to be set in gui (thanks @pi6am)

Important Notice: The CLBlast backend may be removed soon, as it is very outdated and no longer receives and updates, fixes or improvements. It can be considered superceded by the Vulkan backend. If you have concerns, please join the discussion here.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Contributors

ycros, AdamJ8, and 3 other contributors
Assets 9
Loading
D0n-A, henk717, duynt575, 0x524c, Denplay195, RaulMarq, ChekaLI, ialhabbal, pbz134, Chroma01, and 8 more reacted with thumbs up emoji Chroma01 and razrien reacted with laugh emoji leudz, 0x524c, Denplay195, RaulMarq, JohnClaw, ialhabbal, Chroma01, razrien, and Modra11 reacted with hooray emoji Mirazan, Yasakikondo74, Sayemahamed, StevenSYS, Danik-droid, henk717, Pento95, 0x524c, RaulMarq, EditaNEmilis, and 5 more reacted with heart emoji Vladonai, henk717, porzione, 0x524c, RaulMarq, ialhabbal, Chroma01, and razrien reacted with rocket emoji
32 people reacted

koboldcpp-1.105.4

02 Jan 05:21
@LostRuins LostRuins

Choose a tag to compare

koboldcpp-1.105.4

new year edition

oof
  • NEW: Added --gendefaults, accepts a JSON dictionary where you can specify any API fields to append or overwrite (e.g. step count, temperature, top_k) on incoming payloads. Incoming API payloads will have this modification applied. This can be useful when using frontends that don't behave well, as you will be able to override or correct whatever fields they send to koboldcpp.
    • Note: If this marks the horde worker with a debug flag if used on AI Horde.
    • --sdgendefaults has been deprecated and merged into this flag
  • Added support for a new "Adaptive-P" sampler by @MrJackSpade, a sampler that allows selecting lower probability tokens. Recommended to use together with min-P. Configure with adaptive target and adaptive decay parameters. This sampler may be subject to change in future.
  • StableUI SDUI: Fixed generation queue stacking, allowed requesting AVI formatted videos (enable in settings first), added a dismiss button, various small tweaks
  • Minor fixes to tool calling
  • Added support for Ovis Image and new Qwen Image Edit, added support for TAEHV for WAN VAE (you can use it with Wan2.2 videos and Qwen Image/Qwen Image Edit, simply enable "TAE SD" checkbox or --sdvaeauto, greatly saves memory), thanks @wbruna for the sync.
  • Fixed LoRA loading issues with some Qwen Image LoRAs
  • --autofit now allocates some extra space if used with multiple models (image gen, embeddings etc)
  • Improved snapshotting logic with --smartcache for RNN models.
  • Attempted to fix tk scaling on some systems.
  • Renamed KCPP launcher's Tokens tab to Context, moved Flash Attention toggle into hardware tab
  • Updated Kobold Lite, multiple fixes and improvements
    • Added support for using remote http MCP servers for tool calling. KoboldCpp based MCP may be added at a later date.
  • Merged fixes, model support, and improvements from upstream

Hotfix 1.105.1 - Allow configuring number of smartcache slots, updated lite + SDUI, handle tool calling images from remote MCP responses.
Hotfix 1.105.2 - Fixed various minor bugs, allow transcribe to be used with an LLM with audio projector.
Hotfix 1.105.3 - Merged fix for CUDA MoE CPU regression
Hotfix 1.105.4 - Merged vulkan glm4.6 fix

Important Notice: The CLBlast backend may be removed soon, as it is very outdated and no longer receives and updates, fixes or improvements. It can be considered superceded by the Vulkan backend. If you have concerns, please join the discussion here.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build for best support.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Contributors

wbruna and MrJackSpade
Loading
Chroma01, D0n-A, henriqueffc, beebopkim, RaulMarq, lovenemesis, YvesBoyadjian, pbz134, Brensom, syddharth, and 11 more reacted with thumbs up emoji Chroma01, pbz134, and syddharth reacted with laugh emoji leudz, ShaneTracey, StevenSYS, a1270, Chroma01, varunpro, beebopkim, RaulMarq, pbz134, syddharth, and 5 more reacted with hooray emoji Chroma01, Danik-droid, JohnClaw, FlauschBert, Vladonai, Raskoll2, porzione, appartamento, pbz134, Mirazan, and 6 more reacted with heart emoji a1270, Chroma01, RaulMarq, pbz134, syddharth, thijsi123, elcotek, biobash, and shafiqalibhai reacted with rocket emoji
38 people reacted

koboldcpp-1.104

20 Dec 09:07
@LostRuins LostRuins

Choose a tag to compare

koboldcpp-1.104

calm before the storm edition

  • NEW: Added --smartcache adapted from @Pento95 : This is a 2-in-1 dynamic caching solution that intelligently creates KV state snapshots automatically. Read more here
    • This will greatly speed up performance when different contexts are swapped back to back (e.g. hosting on AI Horde or shared instances).
    • Also allows snapshotting when used with a RNN or Hybrid model (e.g. Qwen3Next, RWKV) which avoids having to reprocess everything.
    • Reuses the KV save/load states from admin mode. Max number of KV states increased to 6.
  • NEW: Added --autofit flag which utilizes upstream's "automatic GPU fitting (-fit )" behavior from ggml-org#16653. Note that this flag overwrites all your manual layer configs and tensor overrides and is not guaranteed to work. However, it can provide a better automatic fit in some cases. Will not be accurate if you load multiple models e.g. image gen.
  • Pipeline parallelism is no longer the default, instead its now a flag you can enable with --pipelineparallel. Only affects multi-gpu setups, faster speed at the cost of memory usage.
  • Key Improvement - Vision Bugfix: A bug in mrope position handling has been fixed, which improves vision models like Qwen3-VL. You should now see much better visual accuracy in some multimodal models compared to earlier koboldcpp versions. If you previously had issues with hallucinated text or numbers, it should be much better now.
  • Increased default gen amount from 768 to 896.
  • Deprecated obsolete --forceversion flag.
  • Fixed safetensors loading for Z-Image
  • Fixed image importer in SDUI
  • Capped cfg_scale to max 3.0 for Z-Image to avoid blurry gens. If you want to override this, set remove_limits to 1 in your payload or inside --sdgendefaults.
  • Removed cc7.0 as a CUDA build target, Volta (V100) will fall back to PTX from cc6.1
  • Tweaked branding in llama.cpp UI to make it clear it's not llama.cpp
  • Added indentation to .kcpps configs
  • Updated Kobold Lite, multiple fixes and improvements
  • Merged fixes and improvements from upstream

Important Notice: The CLBlast backend may be removed soon, as it is very outdated and no longer receives and updates, fixes or improvements. It can be considered superceded by the Vulkan backend. If you have concerns, please join the discussion here.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build for best support.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Contributors

Pento95
Loading
JohnClaw, D0n-A, Vladonai, emanuellonnberg, FlauschBert, RomelSan, RaulMarq, Chroma01, EditaNEmilis, sv4v5s, and 10 more reacted with thumbs up emoji Chroma01 and SimplyVelvetRed reacted with laugh emoji leudz, a1270, ehoogeveen-medweb, Chroma01, arthur4002, and junguler reacted with hooray emoji thijsi123, Danik-droid, jabberjabberjabber, Kistara-AI, Chroma01, Mirazan, Ar57m, appartamento, arthur4002, Pento95, and junguler reacted with heart emoji sevenreasons, StevenSYS, Wolf-While, thijsi123, a1270, RaulMarq, Chroma01, Ar57m, arthur4002, beebopkim, and junguler reacted with rocket emoji StevenSYS and SimplyVelvetRed reacted with eyes emoji
33 people reacted

koboldcpp-1.103

04 Dec 12:29
@LostRuins LostRuins

Choose a tag to compare

koboldcpp-1.103

image
  • NEW: Added support for Flux2 and Z-Image Turbo! Another big thanks to @leejet for the sd.cpp implementation and @wbruna for the assistance with testing and merging.
    • To obtain models for Z-Image (Most recommended, lightweight):
    • To obtain models for Flux2 (Not recommended as this model is huge so i will link the q2k. Remember to enable cpu offload. Running anything larger requires a very powerful GPU):
      • Get the Flux 2 Image model here
      • Get the Flux 2 VAE here
      • Get the Flux 2 text encoder here, load this as Clip 1
  • NEW: Mistral and Ministral 3 model support has been merged from upstream.
  • Improved "Assistant Continue" in llama.cpp UI mode, now can be used to continue partial turns.
    • We have added prefill support to chat completions if you have /lcpp in your URL (/lcpp/v1/chat/completions), the regular chat completions is meant to mimick OpenAI and does not do this. Point your frontend to the URL that most fits your use case, we'd like feedback on which one of these you prefer and if the /lcpp behavior would break an existing use case.
  • Minor tool calling fix to avoid passing base64 media strings into the tool call.
  • Tweaked resizing behavior of the launcher UI.
  • Added a secondary terminal UI to view the console logging (only for Linux), can be used even when not launched from CLI. Launch this auxiliary terminal from the Extras tab.
  • AutoGuess Template fixes for GPT-OSS and Kimi
  • Fixed a bug with --showgui mode being saved into some configs
  • Updated Kobold Lite, multiple fixes and improvements
  • Merged fixes and improvements from upstream

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build for best support.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Contributors

wbruna and leejet
Loading
Chroma01, RaulMarq, D0n-A, JohnLoveJoy, Ar57m, lovenemesis, a1270, TravelingMan, beebopkim, Core420, and 7 more reacted with thumbs up emoji Chroma01 and anonimus12345678902 reacted with laugh emoji Chroma01, RaulMarq, JohnClaw, beebopkim, Vladonai, Sayemahamed, and anonimus12345678902 reacted with hooray emoji StevenSYS, henriqueffc, thijsi123, wbruna, Iyanzl, Chroma01, EditaNEmilis, sv4v5s, Danik-droid, Wayfarer-RPWAI, and 11 more reacted with heart emoji sevenreasons, wbruna, Chroma01, Ar57m, a1270, teppeited, sz4bi, No32Ge, and anonimus12345678902 reacted with rocket emoji
36 people reacted

koboldcpp-1.102.3

26 Nov 13:03
@LostRuins LostRuins

Choose a tag to compare

koboldcpp-1.102.3

cold november rain edition

  • New: Now bundles the llama.cpp UI into KoboldCpp, as an extra option for those who prefer it. Access it at http://localhost:5001/lcpp
    • The llama.cpp UI is designed strongly for assistant use-cases and provides a ChatGPT like interface, with support for importing documents like .pdf files. It can be accessed in parallel to the usual KoboldAI Lite UI (which is recommended for roleplay/story writing) and does not take up any additional resources while not in use.
  • New: Massive universal tool calling improvement from @Rose22, with the new format KoboldCpp is now even better at calling tools and using multiple tools in sequence correctly. Works automatically with all tool calling capable frontends (OpenWebUI / SillyTavern etc) in chat completions mode and may work on models that normally do not support tool calling (in the correct format).
  • New: Added support for jinja2 templates via /v1/chat/completions, for those that have been asking for it. There are 3 modes:
    • Current Default: Uses KoboldCpp ChatAdapter templates, KoboldCpp universal toolcalling module (current behavior, most recommended).
    • Using --jinja: Uses jinja2 template from GGUF in chat completions mode for normal messages, uses KoboldCpp universal toolcalling module. Use this only if you love jinja. There are GGUF models on Huggingface which will explicitly mention --jinja must be used to get normal results, this does not apply to KoboldCpp as our regular modes cover these cases.
    • Using --jinja_tools: Uses jinaja2 template from GGUF in chat completions mode for all messages and tools. Not recommended in general. In this mode the model and frontend are responsible for the compatibility.
  • Synced and updated Image Generation to latest stable-diffusion.cpp, big thanks to @wbruna. Please report any issues you encounter.
  • Updated google Colab notebook with easier default selectable presets, thanks @henk717
  • Allow GUI launcher window to be resized slightly larger horizontally, in case some text gets cut off.
  • Fixed a divide by zero error with audio projectors
  • Added Vulkan support for whisper.
  • Filename insensitive search when selecting chat completion adapters
  • Fixed an old bug that caused mirostat to swap parameters. To get the same result as before, swap values for tau and eta.
  • Added a debug command --testmemory to check what values auto GPU detection retrieves (not needed for most)
  • Now serves KoboldAI Lite UI gzipped to browsers that can support it, for faster UI loading.
  • Added sampler support for smoothing curve
  • Updated Kobold Lite, multiple fixes and improvements
    • Web Link-sharing now defaults to dpaste.com as dpaste.org is shut down
    • Added option to save and load custom scenarios in a Scenario Library (like stories but do not contain most settings)
    • Allow single-turn deletion and editing in classic theme instruct mode (click on the icon)
    • Better turn chunking and repacking after editing a message
  • Merged new model support, fixes and improvements from upstream

Hotfix 1.102.2 - Try to fix some issues with flash attention, fixed media attachments in jinja mode
Hotfix 1.102.3 - Merged Qwen3Next support. Note that you need to use batch size 512 or less.

Separately our docker image has been updated to a newer faster Vulkan driver for some AMD GPU's, if you use our docker image a manual docker pull is recommended as these drivers are not always covered by the automatic updates.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build for best support.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Contributors

wbruna, henk717, and Rose22
Loading
D0n-A, StevenSYS, Chroma01, RomelSan, eliasts, Kenqr, BadKey007, Musa-step, nitg16, a1270, and 5 more reacted with thumbs up emoji Chroma01 reacted with laugh emoji Sayemahamed, sevenreasons, leudz, RaulMarq, henriqueffc, Chroma01, lilblam, a1270, IntensivePorpoises, and beebopkim reacted with hooray emoji thijsi123, Sayemahamed, Pento95, RaulMarq, Danik-droid, StevenSYS, Chroma01, EditaNEmilis, Act0301, HeavenFYouMissed, and 6 more reacted with heart emoji thijsi123, varunpro, JohnClaw, RaulMarq, Chroma01, xcloudx01, and a1270 reacted with rocket emoji xcloudx01 reacted with eyes emoji
36 people reacted

koboldcpp-1.101.1

01 Nov 09:13
@LostRuins LostRuins

Choose a tag to compare

koboldcpp-1.101.1

very spooky edition

image
  • Support for Qwen3-VL is merged - For a quick test, get the Qwen3-VL-2B-Instruct model here and the mmproj here. Larger versions exist, but this will work well enough for simple tasks.
  • Added Qwen Image and Qwen Image Edit - Support is now officially available for Qwen Image generation models. These have much better prompt adherence than SDXL or even Flux. Here's how to set up qwen image edit:
  • Added aliases for the OpenAI compatible endpoints without /v1/ prefix.
  • Supports using multiple --overridekv, split by commas.
  • Renamed --blasbatchsize to just --batchsize (old name will still work)
  • Made preview in GUI GPU layer count more accurate, no more +2 extra layers.
  • Added experimental support for fractional scaling in the GUI launcher for Wayland on GNOME. You're still recommended to use KDE or disable fractional scaling for better results.
  • Image generation precision fixes and fallbacks. SDUI also now supports copy with right click on the image preview.
  • Added selection for image generation scheduler
  • Added support for logprobs streaming in openai chat completions API (sent at end)
  • Added VITS api server compatibility endpoint
  • PyInstaller upgraded from 5.11 to 5.12 to fix a crashing bug
  • Added Horde worker Job stats by @xzuyn
  • Updated Kobold Lite, multiple fixes and improvements
    • New: Added branching support! You can now create ST style "branches" in the same story, allowing you to explore multiple alternate possibilities without requiring multiple save files. You can create and delete branches at any point in your story and swap between them at will.
    • Better inline markdown and code rendering
    • Better turn history segmenting after leaving edit mode, also improved AutoRole turn packing
    • Improve trim sentences behavior, improve autoscroll behavior, improve mobile detection
    • Added ccv3 tavern card support
    • Aborted gens will now request for logprobs if enabled
  • Merged new model support, fixes and improvements from upstream, including some Vulkan speedups from occam
  • NOTE: Qwen3Next support is NOT merged yet. It is still undergoing development upstream, follow it here: ggml-org#16095

Hotfix 1.101.1 - Fixed a regression with rowsplit, fixed issue loading very old mmproj files, fixed a crash with qwen image edit.

Starting at 1.101.1 we have upgraded the bundled ROCm library of our ROCm Linux Binary to 7.1, this will have an impact on which GPU's are supported. You should now be able to use KoboldCpp on your 9000 GPU on Linux without having to compile from source. If your system's driver was capable of running the last ROCm release updating drivers is not required, it will automatically use ROCm 7.1 even if you have an older ROCm installed.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Contributors

henk717 and xzuyn
Loading
D0n-A, RomelSan, RaulMarq, sv4v5s, Chroma01, HongYue1, thijsi123, Gabriel-Bisson, Edzward, alucianOriginal, and 12 more reacted with thumbs up emoji Chroma01, thijsi123, Gabriel-Bisson, and MaxBarulin reacted with laugh emoji Krisseck, StevenSYS, demions, wbruna, ehoogeveen-medweb, RaulMarq, JohnClaw, Chroma01, thijsi123, Gabriel-Bisson, and 2 more reacted with hooray emoji Mirazan, Danik-droid, Brensom, Chroma01, HongYue1, thijsi123, Vladonai, Gabriel-Bisson, porzione, alucianOriginal, and 5 more reacted with heart emoji sevenreasons, RaulMarq, Innomen, Chroma01, thijsi123, Gabriel-Bisson, MaxBarulin, and a1270 reacted with rocket emoji
39 people reacted

koboldcpp-1.100.1

12 Oct 02:32
@LostRuins LostRuins

Choose a tag to compare

koboldcpp-1.100.1

I-can't-believe-it's-not-version-2.0-edition

mikobo

  • NEW: WAN Video Generation has been added to KoboldCpp! - You can now generate short videos in KoboldCpp using the WAN model. Special thanks to @leejet for the sd.cpp implementation, and @wbruna for help merging and QoL fixes.
    • Note: WAN requires a LOT of VRAM to run. If you run out of memory, try generating fewer frames and using a lower resolution. Especially on Vulkan, the VAE buffer size may be too large, use --sdvaecpu to run VAE on CPU instead. For comparison, 30 frames (2 seconds) of a 384x576 video will still require about 16GB VRAM even with VAE on CPU and CPU offloading enabled. You can also generate a single frame in which case it will behave like a normal image generation model.
    • Obtain the WAN2.2 14B rapid mega AIO model here . This is the most versatile option and can do both T2V and I2V. I do not recommend using the 1.3B WAN2.1 or the 5B WAN2.2, they both produce rather poor results. If you really don't care about quality, you can use small the 1.3B from here.
    • Next, you will need the correct VAE and UMT5-XXL, note that some WAN models use different ones so if you're bringing your own do check it. Reference links are here.
    • Load them all via the GUI launcher or by using --sdvae, --sdmodel and --sdt5xxl
    • Launch KoboldCpp and open SDUI at http://localhost:5001/sdui. I recommend starting with something small like 15 frames of a 384x384 video with 20 steps. Be prepared to wait a few minutes. The video will be rendered and saved to SDUI when done!
    • It's recommended to use --sdoffloadcpu and --sdvaecpu if you don't have enough VRAM. The VAE buffer can really be huge.
  • Added additional toggle flags for image generation:
    • --sdoffloadcpu - Allows image generation weights to be dynamically loaded/unloaded to RAM when not in use, e.g. during VAE decoding.
    • --sdvaecpu - Performs VAE decoding on CPU using RAM instead.
    • --sdclipcpu - Performs CLIP/T5 decoding on GPU instead (new default is CPU)
  • Updated StableUI to support animations/videos. If you want to perform I2V (Image-To-Video), you can do so in the txt2img panel.
  • Renamed --sdclipl to --sdclip1, and --sdclipg to --sdclip2. These flags are now used whenever there is a vision encoder to be used (e.g. WAN's clip_vision if applicable).
  • Disable TAESD if not applicable.
  • Moved all .embd resource files into a separate directory for improved organization. Also extracted out image generation vocabs into their own files.
  • Moved lowvram CUDA option into a new flag --lowvram (same as -nkvo), which can be used in both CUDA and Vulkan to avoid offloading the KV. Note: This is slow and not generally recommended.
  • Fixed Kimi template, added Granite 4 template.
  • Enabled building for CUDA13 in the CMake, however it's untested and no binaries will be provided, also fixed Vulkan noext compiles.
  • Fixed q4_0 repacking incoherence on CPU only, which started in v1.98.
  • Fixed FastForwarding issues due to misidentified hybrid/rnn models, which should not happen anymore.
  • Added --sdgendefaults to allow setting some default image generation parameters.
  • On admin config reload, reset nonexistent fields in config to default values instead of keeping the old value.
  • Updated Kobold Lite, multiple fixes and improvements
    • Set default filenames based on slot's name when downloading from saved slot.
    • Added dry_penalty_last_n from @joybod which decouples dry range from rep pen range.
    • LaTeX rendering fixes, autoscroll fixes, various small tweaks
  • Merged new model support including GLM4.6 and Granite 4, fixes and improvements from upstream

Hotfix 1.100.1 - Fixed a regression with flash attention on oldcpu builds, fixed kokoro regression.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Contributors

wbruna, joybod, and leejet
Loading
thijsi123, cgessai, eddoliver1, D0n-A, rudolphos, 1972074121, Chroma01, erickwill, RomelSan, Ar57m, and 13 more reacted with thumbs up emoji Chroma01 reacted with laugh emoji thijsi123, StevenSYS, leudz, wbruna, ZichRuby, MaggotHATE, Vladonai, Chroma01, erickwill, Finrandojin, and 9 more reacted with hooray emoji cgessai, Brensom, Danik-droid, Mirazan, GreenShadows, Chroma01, erickwill, FlauschBert, appartamento, iLauxs, and 3 more reacted with heart emoji thijsi123, JohnClaw, Chroma01, erickwill, varunpro, Ar57m, kbingoel, and a1270 reacted with rocket emoji
43 people reacted

koboldcpp-1.99.4

21 Sep 06:20
@LostRuins LostRuins

Choose a tag to compare

koboldcpp-1.99.4

a darker shade of blue edition

preview
  • NEW: - The bundled KoboldAI Lite UI has received a substantial design overhaul in an effort to make it look more modern and polished. The default color scheme has been changed, however the old color scheme is still available (set 'Nostalgia' color scheme in advanced settings). A few extra custom color schemes have also been added (Thanks Lakius, TwistedShadows, toastypigeon, @PeterPeet). Please report any UI bugs you encounter.
  • QOL Change: - Added aliases for llama.cpp command-line flags. To reduce the learning curve for llama.cpp users, the following llama.cpp compatibility flags have been added: -m,-t,--ctx-size,-c,--gpu-layers,--n-gpu-layers,-ngl,--tensor-split,-ts,--main-gpu,-mg,--batch-size,-b,--threads-batch,--no-context-shift,--mlock,-p,--no-mmproj-offload,--model-draft,-md,--draft-max,--draft-n,--gpu-layers-draft,--n-gpu-layers-draft,-ngld,--flash-attn,-fa,--n-cpu-moe,-ncmoe,--override-kv,--override-tensor,-ot,--no-mmap. They should behave as you'd expect from llama.cpp.
  • Renamed --promptlimit to --genlimit, now applies to API requests as well, can be set in the UI launcher.
  • Added a new parameter --ratelimit that will apply per-IP based rate limiting (to help prevent abuse of public instances).
  • Fixed Automatic VRAM detection for rocm and vulkan backends on AMD systems (thanks @lone-cloud)
  • Hide API info display if running in CLI mode.
  • (削除) Flash attention is now checked by default when using GUI launcher. (削除ここまで) (Reverted in 1.99.1 by popular demand)
  • Try fix some embedding models using too much memory.
  • (削除) Standardize model file download locations to the koboldcpp executable's directory. This should help solve issues about non-writable system paths when launching from a different working directory. If you prefer the old behavior, please send some feedback, but I think standardizing it is better than adding special exceptions for some directory paths. (削除ここまで) (Reverted in 1.99.2, with some exceptions)
  • Add psutil to conda environment. Please report if this breaks any setups.
  • Added /v1/audio/voices endpoint, fixed dia wrong voice mapping
  • Updated Kobold Lite, multiple fixes and improvements
    • UI design rework, as mentioned above
    • Fixes for markdown renderer
    • Added a popup to allow enabling TTS or image generation if it's disabled but available.
    • Added new scenario "Aletheia"
    • Increased default context size and amount generated
    • Fix for GPT-OSS instruct format.
    • Smarter automatic detection for "Enter Sends" default based on platform. Toggle moved into advanced settings.
    • Fix for Palemoon browser compatibility
    • Reworked best practices recommendation to think tags - now provides Think/NoThink instruct tags for each instruct sequence. You are now recommended to explicitly select the correct Think/NoThink instruct tags instead of using the <think> forced/prevented prefill. This will provide better results for preventing reasoning than simply injecting a blank <think></think> since some models require specialized reasoning trace formats.
    • For example, to prevent thinking in GLM-Air, you're simply recommended to set the instruct tag to GLM-4.5 Non-Thinking and leave "Insert Thinking" as "Normal" instead of manually messing with the tag injections. This ensures the correct postfix tags for each format are used.
    • By default, KoboldCppAutomatic template permits thinking in models that use it.
  • Merged new model support, fixes and improvements from upstream

Hotfix 1.99.1 - Fix for chroma, revert FA default off, revert ggml-org#16056, fixed rocm compile issues.

Hotfix 1.99.2 - Reverted the download file path changes on request from @henk717 for most cases. Fixed rocm VRAM detection.

Hotfix 1.99.3 and Hotfix 1.99.4 - Fixed aria2 downloading and try to fix kokoro

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Contributors

lone-cloud, henk717, and PeterPeet
Loading
D0n-A, dsignarius, DemonAlone, nightkall, StevenSYS, RomelSan, Vladonai, lovenemesis, mervn, MoeMonsuta, and 23 more reacted with thumbs up emoji Chroma01 and LunaSeaglim reacted with laugh emoji JohnClaw, ShaneTracey, erickwill, Kal350, beebopkim, JoshRozario, Chroma01, and LunaSeaglim reacted with hooray emoji iLauxs, courrier1, thijsi123, DeeiaaN, Danik-droid, ali0une, ehoogeveen-medweb, Mirazan, erickwill, Brensom, and 6 more reacted with heart emoji aleksusklim, thijsi123, wbruna, StevenSYS, erickwill, JoshRozario, Chroma01, NZ3digital, and LunaSeaglim reacted with rocket emoji JohnLoveJoy, JoshRozario, and LunaSeaglim reacted with eyes emoji
47 people reacted

koboldcpp-1.98.1

24 Aug 07:25
@LostRuins LostRuins

Choose a tag to compare

koboldcpp-1.98.1

Kokobold edition

kobo.mp4
  • NEW: TTS.cpp model support has been integrated into KoboldCpp, providing access to new Text-To-Speech models - The TTS.cpp project (repo here) was developed by by @mmwillet, and a modified version has now been added into KoboldCpp, bringing support for 3 new Text-To-Speech models Kokoro , Parler and Dia.
    • Of the above models, Kokoro is the most recommended for general use.
    • Uses the GGML library in KoboldCpp, although the new ops are CPU only, so Kokoro provides the best speed taking size into consideration. You can expect speeds of 2x realtime for Kokoro (fastest), 0.5x realtime for Parler, and 0.1x realtime for Dia (slowest).
    • To use, simply download the GGUF model and load it in the 'Audio' tab as a TTS model. Note: WavTokenizer is not required for these models. Please use the no_espeak versions, KoboldCpp has custom IPA mappings for English and espeak is not supported.
    • KoboldAI Lite provides automatic mapping for the speaker voices. If you wish to use a custom voice for Kokoro, the supported voices are af_alloy, af_aoede, af_bella, af_heart, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky, am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa, bf_alice, bf_emma, bf_isabella, bf_lily, bm_daniel, bm_fable, bm_george, bm_lewis. Only English speech is properly supported.
  • Thanks to @wbruna, image generation has been updated and received multiple improvements:
    • Added separate flash attention and conv2d toggles for image generation --sdflashattention and --sdconvdirect
    • Added ability to use q8 for Image Generation model quantization, in addition to existing q4. --sdquant now accepts a parameter [0/1/2] that specifies quantization level, similar to --quantkv
  • Added --overridenativecontext flag which allows you to easily override the expected trained context of a model when determining automatic RoPE scaling. If you didn't get that, you don't need this feature.
  • Seed-OSS support is merged, including instruct templates for thinking and non-thinking modes.
  • Further improvements to tool calling and audio transcription handling
  • Fixed Stable Diffusion 3.5 loading issue
  • Embedding models now default to the lower of current model max context and trained context. Should help with Qwen3 embedding models. This can be adjusted with --embeddingsmaxctx override.
  • Improve server identifier header for better compatibility with some libraries
  • Termux android_install.sh script can now launch existing downloaded models
  • Minor chat adapter fixes, including Kimi.
  • Added alias for --tensorsplit
  • Benchmark CSV formatting fix.
  • Updated Kobold Lite, multiple fixes and improvements
    • Scenario picker can now load any adventure or chat scenario in Instruct mode.
    • Slightly increased default amount to generate.
    • Improved file saving behavior, try to remember previously used filename.
    • Improved KaTeX rendering and handle additional cases
    • Improved streaming UI for code block streaming at the start of any turn.
    • Added setting to embed generated TTS audio into the context as part of the AI's turn.
    • Minor formatting fixes
    • Added Vision 👁️ and Auditory 🦻 support indicators for inline multimodal media content.
    • Added Seed-OSS instruct templates. Note that Thinking regex must be set manually for this model by changing the think tag.
    • Overhaul narration and media adding system, allow TTS to be manually added with Add File.
  • Merged new model support, fixes and improvements from upstream

Hotfix 1.98.1 - Fix Kokoro for better accuracy and quality, added 4096 as a --blasbatchsize option, fix windows 7 functionality, fixed flash attention issues, synced some new updates from upstream.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Contributors

wbruna and mmwillet
Loading
ShaneTracey, thijsi123, D0n-A, courrier1, hashms0a, RomelSan, Chroma01, lovenemesis, varunpro, henriqueffc, and 27 more reacted with thumbs up emoji Chroma01 and amitbar05 reacted with laugh emoji leudz, JohnClaw, m36, StevenSYS, Chroma01, leonardocouceiro, zakweb3, beebopkim, noiro777, amitbar05, and 3 more reacted with hooray emoji thijsi123, nonepork, ali0une, Danik-droid, StevenSYS, jabberjabberjabber, Chroma01, Mirazan, Vladonai, nikolaygurubrain-hub, and 15 more reacted with heart emoji Peterfish, sevenreasons, Chroma01, specificocean23, amitbar05, and ngphuchoangnam reacted with rocket emoji
61 people reacted

koboldcpp-1.97.4

06 Aug 14:10
@LostRuins LostRuins

Choose a tag to compare

koboldcpp-1.97.4

wander
  • Merged support for GLM4.5 family of models
  • Merged support for GPT-OSS models (note that this model performs poorly if OpenAI instruct templates are not obeyed. To use it in raw story mode, append <|start|>assistant<|channel|>final<|message|> to memory)
  • Merged support for Voxtral (Voxtral Small 24B is better than Voxtral Mini 3B, but both are not great. See ggml-org#14862 (comment))
  • Added /ping stub endpoint to permit usage on Runpod serverless.
  • Allow MoE layers to be easily kept on CPU with --moecpu (layercount) flag. Using this flag without a number will keep all MoE layers on CPU.
  • Clearer indication of support for each multimodal modality Vision/Audio
  • Increased max length of terminal prints allowed in debugmode.
  • Do not attempt context shifting for any mrope models.
  • Adjusted some adapter instruct templates, tweaked mistral template.
  • Handle empty objects returned by tool calls, also remove misinterpretation of the tools calls instruct tag within ChatML autoguess.
  • Allow multiple tool calls to be chained, and allow them to be triggered by any role.
  • WebSearch fix url params parsing
  • Increased regex stack size limit for MSVC builds (fix for mistral models).
  • Updated Kobold Lite, multiple fixes and improvements
    • Added 2 more save slots
    • Added a (+/-) modifier field for Adventure mode rolls
    • Fixed deleting wrong image if multiple selected images are identical.
    • Button to insert textDB separator
    • Improved mid-streaming rendering
    • Slightly lowered default rep pen
    • Simplified Mistral template, added GPT-OSS Harmony template
  • Merged new model support, fixes and improvements from upstream

Hotfix 1.97.1 - More template fixes, now shows generated token's ID in debugmode terminal log, fixed flux loading speed regression, Vulkan BSOD fixed.
Hotfix 1.97.2 - Fix CLBlast regression, limit vulkan bsod fix to nvidia only, updated lite, merged upstream fixes.
Hotfix 1.97.3 - Fix a regression with GPT-OSS that resulted in incoherence
Hotfix 1.97.4 - Fixed OldPC CUDA builds when flash attention was not used. This broke after 1.95 and is now fixed.

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can try koboldcpp_rocm at YellowRoseCx's fork here if you are a Windows user or download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Loading
thijsi123, D0n-A, RomelSan, nightkall, Chroma01, shuroot, JoeySalmons, FranciscoS-Qualtek, VE3XYU, Tyrand, and 22 more reacted with thumbs up emoji Chroma01 and rockodagaer reacted with laugh emoji leudz, thijsi123, Chroma01, Kal350, JohnClaw, rockodagaer, m36, medic17, GiusTex, beebopkim, and 2 more reacted with hooray emoji thijsi123, courrier1, Chroma01, pbz134, Mirazan, paolocrosato, Kal350, Nilohim, Musashisword163, Brensom, and 12 more reacted with heart emoji thijsi123, Chroma01, henriqueffc, sevenreasons, Peterfish, rockodagaer, a1270, Musashisword163, GiusTex, nekitExclyusiw, and 4 more reacted with rocket emoji
55 people reacted
Previous 1 3 4 5 11 12
Previous

AltStyle によって変換されたページ (->オリジナル) /