Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

rootkiller6788/LLMSched

Repository files navigation

LLMSched

A tiny LLM runtime microcore for KV cache, token budget, and batch scheduling.

LLMSched is not a full inference engine. It is a scheduler simulator that captures the core resource control plane of LLM serving: who gets KV cache, how many tokens to allocate, when to batch, and when to reject.

Power

Inference resource scheduling authority — whoever controls token allocation, KV cache leases, and batch merging holds the key to inference performance.

First-Edition Scope

No real model inference. Pure simulator:

requests.jsonl
 ↓
token budget
 ↓
KV cache allocation
 ↓
batch scheduling
 ↓
mock decode step
 ↓
trace.jsonl + scheduling report

Quick Start

cargo run -- run --requests examples/requests.jsonl

Key Concepts

  • Token Budget — global and per-request token allocation with overflow protection
  • KV Cache Lease — allocate/free/evict semantics for GPU memory pages
  • Batch Scheduler — priority queue + deadline-aware continuous batching
  • Telemetry — structured trace output (JSONL) for downstream training

Protocols

Input Output
plan.json (from Apeinx-IR) trace.jsonl (to ApexTrain-Core)
requests.jsonl scheduling report (Markdown)

Tech Stack

Layer Choice
Control logic Rust
CLI clap
Config TOML
Trace format JSONL
Underlying kernels C ABI → KernelLab
Logging tracing

Project Structure

llmsched-core/
├── crates/llmsched-core/src/ # Core: queue, budget, kv, scheduler, telemetry
├── crates/llmsched-cli/src/ # CLI entry point
├── ffi/ # KernelLab C ABI bindings
├── examples/requests.jsonl # Sample workload
└── reports/ # Output trace + report

Relationship to Other Apeinx Projects

KernelLab → (C ABI) → LLMSched → (trace.jsonl) → ApexTrain-Core
 ↑
 Apeinx-IR → (plan.json)

License

TBD

About

Simulate KV cache, token budget, and batch scheduling for LLM serving.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

Languages

AltStyle によって変換されたページ (->オリジナル) /