-
Notifications
You must be signed in to change notification settings - Fork 0
Releases: CodaCipher/iterabeast
IteraBeast v0.4
Synth DataGen Engine
High-performance, UI-driven synthetic data generation for AI fine-tuning.
📡 OVERVIEW
IteraBeast is a modern, cyber-aesthetic web application designed to simplify and accelerate the process of generating large-scale synthetic datasets (.jsonl) for training and fine-tuning LLMs.
Engineered for AI Researchers, Data Scientists, and LLM Fine-tuners, this tool bridges the gap between raw prompt engineering and production-grade dataset creation. It transforms the chaotic task of data synthesis into a streamlined, visually immersive operation—perfect for building RAG pipelines, fine-tuning adapters (LoRA/QLoRA), or generating evaluation benchmarks.
By leveraging a dual-node architecture (FastAPI backend + React frontend), it allows developers to batch-generate diverse, context-aware conversational data across multiple LLM providers simultaneously.
⚡ CORE CAPABILITIES
🧬 Multi-Provider Node Matrix
Seamlessly integrates local (Ollama) and cloud nodes (Groq, OpenRouter, DeepInfra) into a unified generation grid. Switch providers instantly without breaking the workflow.
🔄 Advanced Distribution Routing
Features intelligent workload balancing algorithms including Sequential, Round-Robin, and Hybrid strategies to maximize throughput and minimize API rate limits.
🛡️ Strict JSONL & Schema Enforcement
Implements a rigorous Post-Generation Validation Layer that guarantees 100% valid JSONL syntax. The engine automatically sanitizes output and escapes forbidden characters, ensuring zero-fail ingestion for training pipelines.
🗃️ Direct Stream Architecture
Bypasses memory bottlenecks by streaming generated .jsonl chunks directly to your local SSD via the FileSystem Access API. Capable of handling massive datasets with zero latency.
🧠 Semantic Variation Injection
Prevents dataset overfitting by using MiniLM embeddings to analyze and inject dynamic context. The system autonomously alters sentence structures to ensure high-entropy, semantically diverse data distribution.
🎨 UNSTABLE_CORE Interface
A reactive, hardware-accelerated UI with real-time cost/token telemetry and interchangeable themes (MAGI / UNSTABLE_CORE), designed for high-velocity data operations.
Multi-Provider Integration & Node Configuration IteraBeast Feature 3
Semantic Variation System & Distribution Routing
🛠️ QUICK START
1. Backend Service (FastAPI)
cd backend python -m venv .venv # Activate virtual environment: .venv\Scripts\activate # Windows (Command Prompt) # .\.venv\Scripts\Activate.ps1 # Windows (PowerShell) # source .venv/bin/activate # Linux/Mac pip install -r requirements.txt python main.py
API runs on http://localhost:8000
2. Frontend Client (React)
cd frontend
npm install
npm run devInterface accessible at http://localhost:5173
📁 ARCHITECTURE
IteraBeast/
├── backend/ # Async Server Node
│ ├── main.py # API Endpoints & Generators
│ └── requirements.txt # Dependencies
├── frontend/ # Client UI Node
│ ├── src/
│ │ ├── components/ # Interface Elements & Terminal
│ │ ├── App.jsx # State & Execution Logic
│ │ └── index.css # Styling & Animations
│ └── package.json
└── README.md
⚙️ REQUIREMENTS
- Python 3.9+
- Node.js 18+
- Chromium-based browser (Chrome/Edge) recommended for full
FileSystemWritableFileStreamsupport.