Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Harsh-1165/AgentOS

Repository files navigation

AgentOS

AgentOS

Production-Grade AI Agent Orchestration System

A multi-layer orchestration platform that intelligently routes customer queries through intent classification, safety validation, bilingual support, and automated tool execution — with a full-stack real-time dashboard.

Next.js TypeScript Prisma Accuracy Latency


Screenshots

Test Console

Chat with a live agent — see intent classification, chosen action, confidence score, and response debug in real time

Test Console

Agent Builder

Create and manage multiple agent configurations — each with its own safety mode, confidence threshold, and system prompt

Agent Builder

Execution Logs

Every request is logged with intent, action, safety status, and latency — filterable by intent, action, and status

Logs


What It Does

AgentOS sits between a user's message and your backend tools. Every message goes through a 9-stage orchestration pipeline:

Message → Validate → Sanitize → Load Agent Config (DB)
 → Load FAQs + Tools (DB) → Classify Intent → Safety Check
 → Execute Action (answer / tool / escalate) → Hallucination Guard → Log

It supports 10 intent types, 2 tools (order lookup + ticket creation), 3 safety levels, and bilingual English + Hinglish input — all configurable from the dashboard without touching code.


Orchestration Pipeline

flowchart TD
 A[User Message] --> B[Validate & Sanitize]
 B --> C[Load Agent Config from DB]
 C --> D[Load FAQs + ToolConfig from DB]
 D --> E[Intent Classification\n10 intents · English + Hinglish]
 E --> F[Safety Check\nStrict / Balanced / Permissive]
 F --> G{Action Decision}
 G -- answer --> H[FAQ Match + Template Response]
 G -- tool_call --> I[Tool Execution with Retry\norder_lookup · create_ticket]
 G -- escalate --> J[Escalation Response]
 H --> K[Hallucination Guard]
 I --> K
 J --> K
 K --> L[Async DB Log]
 L --> M[JSON Response]
Loading

Intent Classification (10 Types)

Intent Example Queries Action
greeting "Hello", "Namaste" answer
order_status "Where is my order ORD-123?", "Mera order kaha hai?" tool_call
refund_request "I want a refund", "Paisa wapas chahiye" tool_call
complaint "Product arrived broken", "Saman kharab aaya" escalate
ticket_creation "Open a support ticket", "Naya ticket khol do" tool_call
faq_query "What is your return policy?", "Customer service number kya hai?" answer
shipping_query "When will my package arrive?", "Kab tak aa jayega?" answer
product_query "Do you have iPhone 15?", "Kya iPhone available hai?" escalate
payment_query "Payment failed but money deducted" escalate
abusive_language "Ye bakwas service hai" escalate

Evaluation Results (Ablation Study)

Tested on 32 queries — 16 English + 16 Hinglish — across 3 confidence thresholds:

Threshold Intent Accuracy Escalation Rate Tool Success Rate Avg Latency
0.60 84.4% 50.0% 41.7% 35ms
0.70 84.4% 81.3% 40.0% 21ms
0.80 84.4% 81.3% 66.7% 20ms

Key insight: Higher threshold → low-confidence tool calls escalate instead of attempting execution → tool success rate improves. Intent accuracy stays consistent because classification confidence is threshold-independent.

Hallucination Guard: The guard fires in strict mode (isHallucination: true, confidence 0.9). In balanced mode (default), template responses are intentionally permitted — this is by design, not a gap.


Safety System

Three configurable safety levels per agent:

Level Blocks Allows
strict High + Medium risk · Any reasoning-source response Only FAQ-backed or tool-backed responses
balanced High risk only Template responses + FAQ + tool
permissive Risk score > 0.8 Everything else

Detection: PII (SSN, email, credit card), SQL injection, command injection, template injection, spam, abusive language (English + Hinglish).


Quick Start

Prerequisites

  • Node.js 18+
  • pnpm (or npm)

Setup

# 1. Install dependencies
pnpm install
# 2. Generate Prisma client + create database
npx prisma generate
npx prisma db push
# 3. Seed default agent, FAQs, and tool config
node prisma/seed.js
# 4. Copy environment file
cp .env.example .env.local
# 5. Start dev server
pnpm run dev

Open http://localhost:3000

Environment

# .env.local
DATABASE_URL="file:./dev.db"

Run Evaluation

# Start dev server first, then in a second terminal:
pnpm run eval

The eval script:

  1. ✅ Preflight checks the server is responsive (3 retries)
  2. 🤖 Auto-loads the active agent ID from /api/agents
  3. 📊 Runs 32 queries ×ばつ 3 thresholds (ablation)
  4. 🛡️ Sends a strict-mode hallucination guard probe
  5. 💾 Saves eval/results.json + eval/results.csv

API Reference

POST /api/run

Main orchestration endpoint — runs a message through the full 9-stage pipeline.

Request:

{
 "message": "Where is my order ORD-12345?",
 "agentId": "your-agent-id",
 "sessionId": "optional-session-id",
 "confidenceThreshold": 0.7
}

Response:

{
 "success": true,
 "agentId": "...",
 "intentClassification": {
 "intent": "order_status",
 "confidence": 0.91,
 "language": "english"
 },
 "actionDecision": "tool_call",
 "toolInvocation": {
 "toolId": "order_lookup",
 "result": { "success": true, "data": { "status": "In Transit" } }
 },
 "finalResponse": "Your order ORD-12345 is currently In Transit...",
 "hallucinationCheck": { "isHallucination": false },
 "latencyBreakdown": {
 "intentClassification": 1,
 "safetyCheck": 0,
 "toolExecution": 102,
 "total": 105
 }
}

Override Headers:

  • X-Session-ID — session tracking
  • X-Safety-Mode: strict | balanced | permissive — override agent safety level per-request

Project Structure

AgentOS/
├── app/
│ ├── api/
│ │ ├── run/route.ts ← Main 9-stage orchestrator
│ │ ├── agents/ ← Agent CRUD
│ │ ├── agent-config/ ← Agent settings upsert
│ │ ├── tools/ ← ToolConfig API
│ │ └── knowledge/ ← FAQ management
│ └── (dashboard)/
│ ├── test-console/ ← Live agent chat UI
│ ├── agent-builder/ ← Agent config UI
│ ├── tools-knowledge/ ← FAQ + tool management
│ └── logs/ ← Execution log viewer
├── lib/
│ ├── intent-router.ts ← Intent classification (10 types)
│ ├── safety-guard.ts ← Safety + hallucination detection
│ ├── tools.ts ← Tool implementations
│ └── db-logger.ts ← Queue-based async logger
├── prisma/
│ ├── schema.prisma ← Data models
│ └── seed.js ← Default data seeder
├── scripts/
│ └── eval.js ← Ablation evaluation pipeline
├── eval/
│ ├── dataset.json ← 32-query bilingual test set
│ ├── results.json ← Latest eval output
│ └── results.csv ← CSV export
└── docs/images/ ← Screenshots

Tech Stack

Layer Technology
Framework Next.js 16 (App Router)
Language TypeScript 5
Database SQLite via Prisma ORM
Validation Zod
UI Components shadcn/ui + Radix UI + Tailwind CSS
Logging Custom queue-based async DB logger
Evaluation Node.js ablation script

For full technical documentation including problem statement, data models, and design decisions — see PROJECT.md

About

Production-grade Safe GenAI Agent Orchestrator with intent routing, hallucination guard, tool orchestration, evaluation pipeline, and multi-screen Next.js dashboard.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors

AltStyle によって変換されたページ (->オリジナル) /