Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

paiml/batuta

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

511 Commits

Repository files navigation

batuta

batuta

Orchestration framework for the Sovereign AI Stack — privacy-preserving ML infrastructure in pure Rust

CI Crates.io Documentation Book License


Table of Contents

Overview

Batuta coordinates the Sovereign AI Stack, a comprehensive pure-Rust ecosystem for organizations requiring complete control over their ML infrastructure. The stack enables privacy-preserving inference, model management, and data processing without external cloud dependencies.

Key Capabilities

  • Privacy Tiers: Sovereign (local-only), Private (VPC), Standard (cloud-enabled)
  • Model Security: Ed25519 signatures, ChaCha20-Poly1305 encryption, BLAKE3 content addressing
  • API Compatibility: OpenAI-compatible endpoints for drop-in replacement
  • Observability: Prometheus metrics, distributed tracing, A/B testing
  • Cost Control: Circuit breakers with configurable daily budgets

Installation

cargo install batuta

Or add to your Cargo.toml:

[dependencies]
batuta = "0.4"

Quick Start

# Analyze project structure and dependencies
batuta analyze --languages --dependencies --tdg
# Query the Sovereign AI Stack
batuta oracle "How do I serve a Llama model locally?"
# Model registry operations
batuta pacha pull llama3-8b-q4
batuta pacha sign model.gguf --identity alice@example.com
batuta pacha verify model.gguf
# Encrypt models for distribution
batuta pacha encrypt model.gguf --password-env MODEL_KEY
batuta pacha decrypt model.gguf.enc --password-env MODEL_KEY

Usage

Project Analysis

# Full project analysis with TDG scoring
batuta analyze --languages --dependencies --tdg .
# Language detection only
batuta analyze --languages .
# Output formats: text (default), json, markdown
batuta analyze --format json .

Oracle Queries

# Natural language queries about the Sovereign AI Stack
batuta oracle "How do I train a random forest model?"
# RAG-based documentation search (requires indexing first)
batuta oracle --rag-index # Index stack documentation
batuta oracle --rag "tokenization" # Search indexed docs
# Interactive oracle mode
batuta oracle --interactive

Stack Management

# Check stack component versions
batuta stack versionshttps://www.coursera.org/specializations/hugging-face-ai-development
# Quality matrix for all components
batuta stack quality
# Dependency health check
batuta stack check

Demo

asciicast

Live Demo: paiml.github.io/batuta | API Docs

Example Output (batuta analyze --tdg):

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 📊 Technical Debt Gradient Analysis
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Project: my-project
 Language: Rust (confidence: 98%)
 Metrics:
 Cyclomatic Complexity: 4.2 avg (good)
 Test Coverage: 87% (A-)
 Documentation: 92% (A)
 Dependency Health: 95% (A+)
 TDG Score: 91.5/100 (A)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Stack Components

Batuta orchestrates a layered architecture of pure-Rust components:

┌─────────────────────────────────────────────────────────────┐
│ batuta v0.4.8 │
│ (Orchestration Layer) │
├─────────────────────────────────────────────────────────────┤
│ realizar v0.5 │ pacha v0.2 │
│ (Inference Engine) │ (Model Registry) │
├──────────────────────────┴──────────────────────────────────┤
│ aprender v0.24 │ entrenar v0.5 │ alimentar v0.2 │
│ (ML Algorithms) │ (Training) │ (Data Loading) │
├─────────────────────────────────────────────────────────────┤
│ trueno v0.11 │ repartir v2.0 │ renacer v0.9 │
│ (SIMD/GPU Compute) │ (Distributed) │ (Syscall Tracing) │
└─────────────────────────────────────────────────────────────┘

Core Components

Component Version Description
trueno 0.11 SIMD/GPU compute primitives (AVX2/AVX-512/NEON, wgpu)
aprender 0.24 ML algorithms: regression, trees, clustering, NAS
entrenar 0.5 Training: autograd, LoRA/QLoRA, quantization
realizar 0.5 Inference engine for GGUF/SafeTensors models
pacha 0.2 Model registry with signatures, encryption, lineage
repartir 2.0 Distributed compute (CPU/GPU/Remote executors)
renacer 0.9 Syscall tracing with semantic validation
batuta 0.4 Stack orchestration, drift detection, CLI

Extended Ecosystem

Component Version Description
trueno-db 0.3 GPU-accelerated analytics database
trueno-graph 0.1 Graph database for code analysis
trueno-rag 0.1 RAG pipeline (chunking, BM25+vector, RRF)
trueno-viz 0.1 Terminal/PNG visualization
alimentar 0.2 Zero-copy Parquet/Arrow data loading
whisper-apr 0.1 Pure Rust Whisper ASR (WASM-first)
jugar 0.1 Game engine (ECS, physics, AI, WASM)
simular 0.3 Simulation engine (Monte Carlo, physics)
bashrs 6.53 Shell-to-Rust transpiler and linter
presentar 0.3 Terminal presentation framework
pmat 2.213 Project quality analysis toolkit

Commands

batuta analyze

Analyze project structure, languages, and dependencies:

batuta analyze --languages --dependencies --tdg
# Output:
# Primary language: Python
# Dependencies: pip (42 packages), ML frameworks detected
# TDG Score: 73.2/100 (B)
# Recommended: Use Aprender for ML, Realizar for inference

batuta oracle

Query the stack for component recommendations:

# Natural language queries
batuta oracle "Train random forest on 1M samples"
# List all components
batuta oracle --list
# Component details
batuta oracle --show realizar
# Interactive mode
batuta oracle --interactive

batuta pacha

Model registry operations:

# Pull models from registry
batuta pacha pull llama3-8b-q4
# Generate signing keys
batuta pacha keygen --identity alice@example.com
# Sign models for distribution
batuta pacha sign model.gguf --identity alice@example.com
# Verify model signatures
batuta pacha verify model.gguf
# Encrypt models at rest
batuta pacha encrypt model.gguf --password-env MODEL_KEY
# Decrypt for inference
batuta pacha decrypt model.gguf.enc --password-env MODEL_KEY

batuta content

Generate structured content with quality constraints:

# Available content types
batuta content types
# Generate book chapter prompt
batuta content emit --type bch --title "Error Handling" --audience "developers"
# Validate content quality
batuta content validate --type bch chapter.md

batuta stack

Manage the Sovereign AI Stack ecosystem:

# Check stack component versions
batuta stack versions
# Detect version drift across published crates
batuta stack drift
# Generate fix commands for drift issues
batuta stack drift --fix --workspace ~/src
# Check which crates need publishing
batuta stack publish-status
# Quality gate for CI/pre-commit
batuta stack gate

Automatic Drift Detection: Batuta blocks all commands if published stack crates are using outdated versions of other stack crates. Use --unsafe-skip-drift-check to bypass in emergencies.

Privacy Tiers

The stack enforces data sovereignty through configurable privacy tiers:

Tier Behavior Use Case
Sovereign Blocks ALL external API calls Healthcare, Government
Private VPC/dedicated endpoints only Financial services
Standard Public APIs allowed General deployment
use batuta::serve::{BackendSelector, PrivacyTier};
let selector = BackendSelector::new()
 .with_privacy(PrivacyTier::Sovereign);
// Returns only local backends: Realizar, Ollama, LlamaCpp
let backends = selector.recommend();

Model Security

Digital Signatures (Ed25519)

Verify model integrity before loading:

use pacha::signing::{SigningKey, sign_model, verify_model};
let signing_key = SigningKey::generate();
let signature = sign_model(&model_data, &signing_key)?;
// Verification fails if model tampered
verify_model(&model_data, &signature)?;

Encryption at Rest (ChaCha20-Poly1305)

Protect models during distribution:

use pacha::crypto::{encrypt_model, decrypt_model};
let encrypted = encrypt_model(&model_data, "password")?;
let decrypted = decrypt_model(&encrypted, "password")?;

Documentation

Design Principles

Batuta applies Toyota Production System principles:

Principle Application
Jidoka Automatic failover with context preservation
Poka-Yoke Privacy tiers prevent data leakage
Heijunka Spillover routing for load leveling
Muda Cost circuit breakers prevent waste
Kaizen Continuous metrics and optimization

Development

# Clone repository
git clone https://github.com/paiml/batuta.git
cd batuta
# Build
cargo build --release
# Run tests
cargo test
# Build documentation
mdbook build book

Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository and create your branch from main
  2. Run tests before submitting: cargo test --all-features
  3. Run lints: cargo clippy --all-targets --all-features -- -D warnings
  4. Format code: cargo fmt --all
  5. Update documentation for any API changes
  6. Submit a pull request with a clear description

See our CI workflow for the full test suite.

License

MIT License — see LICENSE for details.

Links


Batuta — Orchestrating sovereign AI infrastructure.

About

Orchestration of many projects from Pragmatic AI Labs

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

Languages

AltStyle によって変換されたページ (->オリジナル) /