Synth DataGen Engine
High-performance, UI-driven synthetic data generation for AI fine-tuning.
IteraBeast is a modern, cyber-aesthetic web application designed to simplify and accelerate the process of generating large-scale synthetic datasets (.jsonl) for training and fine-tuning LLMs.
Engineered for AI Researchers, Data Scientists, and LLM Fine-tuners, this tool bridges the gap between raw prompt engineering and production-grade dataset creation. It transforms the chaotic task of data synthesis into a streamlined, visually immersive operationβperfect for building RAG pipelines, fine-tuning adapters (LoRA/QLoRA), or generating evaluation benchmarks.
By leveraging a dual-node architecture (FastAPI backend + React frontend), it allows developers to batch-generate diverse, context-aware conversational data across multiple LLM providers simultaneously.
Seamlessly integrates local (Ollama) and cloud nodes (Groq, OpenRouter, DeepInfra) into a unified generation grid. Switch providers instantly without breaking the workflow.
Features intelligent workload balancing algorithms including Sequential, Round-Robin, and Hybrid strategies to maximize throughput and minimize API rate limits.
Implements a rigorous Post-Generation Validation Layer that guarantees 100% valid JSONL syntax. The engine automatically sanitizes output and escapes forbidden characters, ensuring zero-fail ingestion for training pipelines.
Bypasses memory bottlenecks by streaming generated .jsonl chunks directly to your local SSD via the FileSystem Access API. Capable of handling massive datasets with zero latency.
Prevents dataset overfitting by using MiniLM embeddings to analyze and inject dynamic context. The system autonomously alters sentence structures to ensure high-entropy, semantically diverse data distribution.
A reactive, hardware-accelerated UI with real-time cost/token telemetry and interchangeable themes (MAGI / UNSTABLE_CORE), designed for high-velocity data operations.
Multi-Provider Integration & Node Configuration IteraBeast Feature 3
Semantic Variation System & Distribution Routing
cd backend python -m venv .venv # Activate virtual environment: .venv\Scripts\activate # Windows (Command Prompt) # .\.venv\Scripts\Activate.ps1 # Windows (PowerShell) # source .venv/bin/activate # Linux/Mac pip install -r requirements.txt python main.py
API runs on http://localhost:8000
cd frontend
npm install
npm run devInterface accessible at http://localhost:5173
IteraBeast/
βββ backend/ # Async Server Node
β βββ main.py # API Endpoints & Generators
β βββ requirements.txt # Dependencies
βββ frontend/ # Client UI Node
β βββ src/
β β βββ components/ # Interface Elements & Terminal
β β βββ App.jsx # State & Execution Logic
β β βββ index.css # Styling & Animations
β βββ package.json
βββ README.md
- Python 3.9+
- Node.js 18+
- Chromium-based browser (Chrome/Edge) recommended for full
FileSystemWritableFileStreamsupport.