Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

dimohy/IgnisCore

Repository files navigation

IgnisCore

한국어 문서

IgnisCore is an experimental local LLM inference engine written in C#/.NET and Vulkan Compute. It focuses on running Gemma 4 GGUF models on Windows with a fully local GPU pipeline: model loading, tokenization, prefill/decode, FlashAttention, Cooperative Matrix acceleration, and TurboQuant KV-cache compression experiments.

Status: active research and engineering prototype. APIs, kernels, and model compatibility can change quickly.

Highlights

  • C# / .NET 10 implementation with NativeAOT-friendly project settings.
  • Vulkan Compute backend through Silk.NET Vulkan.
  • Gemma 4 GGUF loading with Q8_0-oriented optimized paths.
  • FlashAttention and NVIDIA Cooperative Matrix 2 prefill paths.
  • TurboQuant KV-cache compression experiments for long-context VRAM efficiency.
  • Interactive chat, single-prompt mode, benchmark mode, and system-prompt support.
  • 8GB-friendly Gemma 4 E2B Q8_0 launcher and 12GB-oriented Gemma 4 E4B Q8_0 launcher.

Requirements

  • Windows.
  • .NET 10 SDK.
  • Vulkan 1.3-capable GPU and driver.
  • Vulkan SDK is recommended for shader development.
  • Hugging Face access for gated Gemma model metadata/weights when downloading models.

Optional local Hugging Face token:

# .env
HF_TOKEN=hf_your_token_here

The .env file is intentionally ignored by Git.

Quick start

Clone and build:

git clone https://github.com/dimohy/IgnisCore.git
cd IgnisCore
dotnet build .\src\IgnisCore.csproj -c Release

Run the 8GB-friendly model launcher:

.\run-chat-gemma4-e2b-it-q8-8g.ps1

Run the larger 12GB-oriented model launcher:

.\run-chat-gemma4-e4b-it-q8-12g.ps1

Both launchers forward extra arguments to IgnisCore, so you can override settings:

.\run-chat-gemma4-e2b-it-q8-8g.ps1 --prompt "Who are you?" --max-tokens 64
.\run-chat-gemma4-e2b-it-q8-8g.ps1 --max-seq-len 4096

Downloaded models are stored under models/, which is ignored by Git.

CLI examples

Show help:

dotnet run -c Release --project .\src\IgnisCore.csproj -- --help

Download/verify a known model without running inference:

dotnet run -c Release --project .\src\IgnisCore.csproj -- --model gemma-4-e2b-it --gguf-type q8_0 --download-only

Run a single prompt:

dotnet run -c Release --project .\src\IgnisCore.csproj -- --model gemma-4-e2b-it --gguf-type q8_0 --prompt "Introduce IgnisCore" --max-tokens 128

Run a synthetic benchmark:

dotnet run -c Release --project .\src\IgnisCore.csproj -- --model gemma-4-e4b-it --gguf-type q8_0 --benchmark --bench-pp 512 --bench-tg 64

Known model aliases

Alias Weight repository Metadata repository Default GGUF Suggested GPU
gemma-4-e2b-it unsloth/gemma-4-E2B-it-GGUF google/gemma-4-E2B-it q8_0 8GB+
gemma-4-e4b-it unsloth/gemma-4-E4B-it-GGUF google/gemma-4-e4b-it q8_0 12GB+

Repository layout

Path Purpose
src/ IgnisCore C# project
src/Engine/ Transformer, chat, sampling, and vision pipeline orchestration
src/Gpu/ Vulkan context, buffer management, and tensor operations
src/Model/ GGUF/SafeTensors/tokenizer/config/model download support
src/Shaders/ GLSL compute shaders and embedded SPIR-V artifacts
src/TurboQuant/ TurboQuant KV-cache compression components
run-chat-gemma4-e2b-it-q8-8g.ps1 8GB-friendly Gemma 4 E2B Q8_0 chat launcher
run-chat-gemma4-e4b-it-q8-12g.ps1 Gemma 4 E4B Q8_0 chat launcher for larger GPUs

Notes

  • IgnisCore is optimized around GGUF Q8_0 paths today. Other quantization names may exist upstream but are not necessarily supported by the current kernels.
  • Cooperative Matrix paths require compatible NVIDIA Vulkan driver/device support. Use --no-coopmat when diagnosing portability issues.
  • Model files are large and are not committed to this repository.

License

Apache-2.0. See LICENSE.

About

IgnisCore — Experimental local LLM inference engine in C#/.NET with Vulkan Compute. Runs Gemma 4 GGUF models fully on Windows GPU, featuring model loading, tokenization, prefill/decode, FlashAttention, Cooperative Matrix acceleration, and TurboQuant KV-cache compression experiments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /