Name	Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode	.vscode
app	app
components	components
docs/images	docs/images
eval	eval
examples	examples
hooks	hooks
lib	lib
prisma	prisma
public	public
scripts	scripts
styles	styles
.env.example	.env.example
.gitignore	.gitignore
PROJECT.md	PROJECT.md
README.md	README.md
components.json	components.json
next.config.mjs	next.config.mjs
package.json	package.json
pnpm-lock.yaml	pnpm-lock.yaml
postcss.config.mjs	postcss.config.mjs
setup.bat	setup.bat
setup.sh	setup.sh
tsconfig.json	tsconfig.json

AgentOS

Production-Grade AI Agent Orchestration System

A multi-layer orchestration platform that intelligently routes customer queries through intent classification, safety validation, bilingual support, and automated tool execution — with a full-stack real-time dashboard.

Next.js TypeScript Prisma Accuracy Latency

Screenshots

Test Console

Chat with a live agent — see intent classification, chosen action, confidence score, and response debug in real time

Test Console

Agent Builder

Create and manage multiple agent configurations — each with its own safety mode, confidence threshold, and system prompt

Agent Builder

Execution Logs

Every request is logged with intent, action, safety status, and latency — filterable by intent, action, and status

Logs

What It Does

AgentOS sits between a user's message and your backend tools. Every message goes through a 9-stage orchestration pipeline:

Message → Validate → Sanitize → Load Agent Config (DB)
 → Load FAQs + Tools (DB) → Classify Intent → Safety Check
 → Execute Action (answer / tool / escalate) → Hallucination Guard → Log

It supports 10 intent types, 2 tools (order lookup + ticket creation), 3 safety levels, and bilingual English + Hinglish input — all configurable from the dashboard without touching code.

Orchestration Pipeline

flowchart TD
 A[User Message] --> B[Validate & Sanitize]
 B --> C[Load Agent Config from DB]
 C --> D[Load FAQs + ToolConfig from DB]
 D --> E[Intent Classification\n10 intents · English + Hinglish]
 E --> F[Safety Check\nStrict / Balanced / Permissive]
 F --> G{Action Decision}
 G -- answer --> H[FAQ Match + Template Response]
 G -- tool_call --> I[Tool Execution with Retry\norder_lookup · create_ticket]
 G -- escalate --> J[Escalation Response]
 H --> K[Hallucination Guard]
 I --> K
 J --> K
 K --> L[Async DB Log]
 L --> M[JSON Response]

Intent Classification (10 Types)

Intent	Example Queries	Action
`greeting`	"Hello", "Namaste"	answer
`order_status`	"Where is my order ORD-123?", "Mera order kaha hai?"	tool_call
`refund_request`	"I want a refund", "Paisa wapas chahiye"	tool_call
`complaint`	"Product arrived broken", "Saman kharab aaya"	escalate
`ticket_creation`	"Open a support ticket", "Naya ticket khol do"	tool_call
`faq_query`	"What is your return policy?", "Customer service number kya hai?"	answer
`shipping_query`	"When will my package arrive?", "Kab tak aa jayega?"	answer
`product_query`	"Do you have iPhone 15?", "Kya iPhone available hai?"	escalate
`payment_query`	"Payment failed but money deducted"	escalate
`abusive_language`	"Ye bakwas service hai"	escalate

Evaluation Results (Ablation Study)

Tested on 32 queries — 16 English + 16 Hinglish — across 3 confidence thresholds:

Threshold	Intent Accuracy	Escalation Rate	Tool Success Rate	Avg Latency
0.60	84.4%	50.0%	41.7%	35ms
0.70	84.4%	81.3%	40.0%	21ms
0.80	84.4%	81.3%	66.7%	20ms

Key insight: Higher threshold → low-confidence tool calls escalate instead of attempting execution → tool success rate improves. Intent accuracy stays consistent because classification confidence is threshold-independent.

Hallucination Guard: The guard fires in strict mode (isHallucination: true, confidence 0.9). In balanced mode (default), template responses are intentionally permitted — this is by design, not a gap.

Safety System

Three configurable safety levels per agent:

Level	Blocks	Allows
`strict`	High + Medium risk · Any reasoning-source response	Only FAQ-backed or tool-backed responses
`balanced`	High risk only	Template responses + FAQ + tool
`permissive`	Risk score > 0.8	Everything else

Detection: PII (SSN, email, credit card), SQL injection, command injection, template injection, spam, abusive language (English + Hinglish).

Quick Start

Prerequisites

Node.js 18+
pnpm (or npm)

Setup

# 1. Install dependencies
pnpm install
# 2. Generate Prisma client + create database
npx prisma generate
npx prisma db push
# 3. Seed default agent, FAQs, and tool config
node prisma/seed.js
# 4. Copy environment file
cp .env.example .env.local
# 5. Start dev server
pnpm run dev

Open http://localhost:3000

Environment

# .env.local
DATABASE_URL="file:./dev.db"

Run Evaluation

# Start dev server first, then in a second terminal:
pnpm run eval

The eval script:

✅ Preflight checks the server is responsive (3 retries)
🤖 Auto-loads the active agent ID from /api/agents
📊 Runs 32 queries ×ばつ 3 thresholds (ablation)
🛡️ Sends a strict-mode hallucination guard probe
💾 Saves eval/results.json + eval/results.csv

API Reference

`POST /api/run`

Main orchestration endpoint — runs a message through the full 9-stage pipeline.

Request:

{
 "message": "Where is my order ORD-12345?",
 "agentId": "your-agent-id",
 "sessionId": "optional-session-id",
 "confidenceThreshold": 0.7
}

Response:

{
 "success": true,
 "agentId": "...",
 "intentClassification": {
 "intent": "order_status",
 "confidence": 0.91,
 "language": "english"
 },
 "actionDecision": "tool_call",
 "toolInvocation": {
 "toolId": "order_lookup",
 "result": { "success": true, "data": { "status": "In Transit" } }
 },
 "finalResponse": "Your order ORD-12345 is currently In Transit...",
 "hallucinationCheck": { "isHallucination": false },
 "latencyBreakdown": {
 "intentClassification": 1,
 "safetyCheck": 0,
 "toolExecution": 102,
 "total": 105
 }
}

Override Headers:

X-Session-ID — session tracking
X-Safety-Mode: strict | balanced | permissive — override agent safety level per-request

Project Structure

AgentOS/
├── app/
│ ├── api/
│ │ ├── run/route.ts ← Main 9-stage orchestrator
│ │ ├── agents/ ← Agent CRUD
│ │ ├── agent-config/ ← Agent settings upsert
│ │ ├── tools/ ← ToolConfig API
│ │ └── knowledge/ ← FAQ management
│ └── (dashboard)/
│ ├── test-console/ ← Live agent chat UI
│ ├── agent-builder/ ← Agent config UI
│ ├── tools-knowledge/ ← FAQ + tool management
│ └── logs/ ← Execution log viewer
├── lib/
│ ├── intent-router.ts ← Intent classification (10 types)
│ ├── safety-guard.ts ← Safety + hallucination detection
│ ├── tools.ts ← Tool implementations
│ └── db-logger.ts ← Queue-based async logger
├── prisma/
│ ├── schema.prisma ← Data models
│ └── seed.js ← Default data seeder
├── scripts/
│ └── eval.js ← Ablation evaluation pipeline
├── eval/
│ ├── dataset.json ← 32-query bilingual test set
│ ├── results.json ← Latest eval output
│ └── results.csv ← CSV export
└── docs/images/ ← Screenshots

Tech Stack

Layer	Technology
Framework	Next.js 16 (App Router)
Language	TypeScript 5
Database	SQLite via Prisma ORM
Validation	Zod
UI Components	shadcn/ui + Radix UI + Tailwind CSS
Logging	Custom queue-based async DB logger
Evaluation	Node.js ablation script

For full technical documentation including problem statement, data models, and design decisions — see PROJECT.md

Folders and files

Latest commit

History

Repository files navigation

AgentOS

AgentOS

Production-Grade AI Agent Orchestration System

Screenshots

Test Console

Agent Builder

Execution Logs

What It Does

Orchestration Pipeline

Intent Classification (10 Types)

Evaluation Results (Ablation Study)

Safety System

Quick Start

Prerequisites

Setup

Environment

Run Evaluation

API Reference

POST /api/run

Project Structure

Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/run`

Packages