Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

BitNet Core

Sujan Mishra edited this page Jun 26, 2025 · 1 revision

BitNet Core (bitnet-core)

A pure Rust, streaming-friendly core engine for BitNet models, focused on high-performance inference, quantization, and kernel dispatch. Includes all performance-critical logic, model definitions, and backend implementations for both CPU and GPU (WGSL).


Table of Contents


Purpose

  • Serve as the backend engine for BitNet inference (and planned training)
  • Provide modular, extensible components for model architecture, quantization, and kernel dispatch
  • Support both CPU (SIMD) and GPU (WGSL) backends
  • Enable streaming-friendly, per-block model loading and execution

Main Modules

  • model.rs: Pure Rust Transformer model architecture (no burn dependency)
  • attention.rs, feed_forward.rs, rms_norm.rs: Core model submodules (pure Rust)
  • bitnet_linear.rs: BitLinear quantized layer, packing, and quantization utilities
  • kernels/: CPU/GPU kernel implementations (WGSL, SIMD)
  • settings.rs: Inference and generation settings
  • embedding.rs: Embedding layer
  • tokenizer.rs: Tokenizer and chat template logic
  • error.rs: Error types and handling
  • gui/: (Optional) Core-level visualization and debugging UI for developers (feature-gated)
  • training.rs, visualization.rs: (Planned) Training and logging/metrics hooks

Architecture

  • Pure Rust, burn-free: All core logic is implemented in Rust, with no dependency on the burn framework for inference
  • Streaming-friendly: Model weights are loaded per-block, supporting large models and efficient memory usage
  • Quantized & packed: Uses ternary quantization and efficient packing for weights and activations
  • GPU kernel integration: Includes WGSL kernels for high-performance inference on modern GPUs

How to Use

Add to your Cargo.toml:

bitnet-core = { path = "../bitnet-core" }

Then in your code:

use bitnet_core::model::Transformer;
// ...

Features

  • Modular, extensible design
  • Optional GPU and core-gui features (feature flags)
  • Designed for correctness, performance, and portability
  • Streaming-friendly model loading and execution
  • Robust error handling and test coverage

Kernel & Quantization

  • WGSL GPU kernel: See src/kernels/bitnet_kernel.wgsl for the main ternary matmul kernel
  • Packing utilities: See src/kernels.rs for pure Rust packing and scale calculation
  • Quantization: Scalar and SIMD quantization utilities for activations and weights
  • Tested against scalar reference: All kernels are validated against pure Rust reference implementations

Test Coverage

  • Unit tests for packing, quantization, and kernel correctness
  • Direct wgpu kernel launch tests (no burn dependency)
  • End-to-end model pipeline validation (see tests/pipeline_validation.rs)
  • Streaming and per-block model loading tests
  • Optional Stress Test: A long-running stress test (stress_test_maximum_dimension_support) is available but ignored by default. To run it, set the RUN_STRESS_TESTS environment variable:
    • PowerShell:
      $env:RUN_STRESS_TESTS="1"; cargo test --package bitnet-core --test kernel_tests -- --nocapture
    • Linux/macOS:
      RUN_STRESS_TESTS=1 cargo test --package bitnet-core --test kernel_tests -- --nocapture

Implementation Notes

  • See the project plan for architecture and validation strategies
  • Use feature flags to enable GPU or core-gui modules
  • For kernel and quantization details, see code comments in src/kernels.rs and src/kernels/bitnet_kernel.wgsl

For questions or contributions, see the main project README or open an issue.

Clone this wiki locally

AltStyle によって変換されたページ (->オリジナル) /