Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

CodaCipher/iterabeast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

17 Commits

Repository files navigation

Typing SVG IteraBeast Main Demo


Synth DataGen Engine
High-performance, UI-driven synthetic data generation for AI fine-tuning.

Status Backend Frontend Output


πŸ“‘ OVERVIEW

IteraBeast is a modern, cyber-aesthetic web application designed to simplify and accelerate the process of generating large-scale synthetic datasets (.jsonl) for training and fine-tuning LLMs.

Engineered for AI Researchers, Data Scientists, and LLM Fine-tuners, this tool bridges the gap between raw prompt engineering and production-grade dataset creation. It transforms the chaotic task of data synthesis into a streamlined, visually immersive operationβ€”perfect for building RAG pipelines, fine-tuning adapters (LoRA/QLoRA), or generating evaluation benchmarks.

By leveraging a dual-node architecture (FastAPI backend + React frontend), it allows developers to batch-generate diverse, context-aware conversational data across multiple LLM providers simultaneously.


⚑ CORE CAPABILITIES

🧬 Multi-Provider Node Matrix

Seamlessly integrates local (Ollama) and cloud nodes (Groq, OpenRouter, DeepInfra) into a unified generation grid. Switch providers instantly without breaking the workflow.

πŸ”„ Advanced Distribution Routing

Features intelligent workload balancing algorithms including Sequential, Round-Robin, and Hybrid strategies to maximize throughput and minimize API rate limits.

πŸ›‘οΈ Strict JSONL & Schema Enforcement

Implements a rigorous Post-Generation Validation Layer that guarantees 100% valid JSONL syntax. The engine automatically sanitizes output and escapes forbidden characters, ensuring zero-fail ingestion for training pipelines.

πŸ—ƒοΈ Direct Stream Architecture

Bypasses memory bottlenecks by streaming generated .jsonl chunks directly to your local SSD via the FileSystem Access API. Capable of handling massive datasets with zero latency.

🧠 Semantic Variation Injection

Prevents dataset overfitting by using MiniLM embeddings to analyze and inject dynamic context. The system autonomously alters sentence structures to ensure high-entropy, semantically diverse data distribution.

🎨 UNSTABLE_CORE Interface

A reactive, hardware-accelerated UI with real-time cost/token telemetry and interchangeable themes (MAGI / UNSTABLE_CORE), designed for high-velocity data operations.


IteraBeast Feature 1

Multi-Provider Integration & Node Configuration IteraBeast Feature 3

Semantic Variation System & Distribution Routing

πŸ› οΈ QUICK START

1. Backend Service (FastAPI)

cd backend
python -m venv .venv
# Activate virtual environment:
.venv\Scripts\activate # Windows (Command Prompt)
# .\.venv\Scripts\Activate.ps1 # Windows (PowerShell)
# source .venv/bin/activate # Linux/Mac
pip install -r requirements.txt
python main.py

API runs on http://localhost:8000

2. Frontend Client (React)

cd frontend
npm install
npm run dev

Interface accessible at http://localhost:5173


πŸ“ ARCHITECTURE

IteraBeast/
β”œβ”€β”€ backend/ # Async Server Node
β”‚ β”œβ”€β”€ main.py # API Endpoints & Generators
β”‚ └── requirements.txt # Dependencies
β”œβ”€β”€ frontend/ # Client UI Node
β”‚ β”œβ”€β”€ src/
β”‚ β”‚ β”œβ”€β”€ components/ # Interface Elements & Terminal
β”‚ β”‚ β”œβ”€β”€ App.jsx # State & Execution Logic
β”‚ β”‚ └── index.css # Styling & Animations
β”‚ └── package.json
└── README.md

βš™οΈ REQUIREMENTS

  • Python 3.9+
  • Node.js 18+
  • Chromium-based browser (Chrome/Edge) recommended for full FileSystemWritableFileStream support.


[ SYSTEM_STATUS: OPERATIONAL ] | [ CAPACITY: OPTIMAL ]

CodaCipher

END_OF_LINE_SEQUENCE

About

IteraBeast is a synthetic data generation engine designed to simplify and accelerate the process of generating large-scale synthetic datasets for training and fine-tuning LLMs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

Contributors

AltStyle γ«γ‚ˆγ£γ¦ε€‰ζ›γ•γ‚ŒγŸγƒšγƒΌγ‚Έ (->γ‚ͺγƒͺγ‚ΈγƒŠγƒ«) /