Human-in-the-loop execution for LLM agents
-
Updated
Jan 11, 2026 - Python
Human-in-the-loop execution for LLM agents
Guardrails for LLMs: detect and block hallucinated tool calls to improve safety and reliability.
π‘οΈ Safe AI Agents through Action Classifier
π‘οΈ Open-source safety guardrail for AI agent tool calls. <2ms, zero dependencies.
The missing safety layer for AI Agents. Adaptive High-Friction Guardrails (Time-locks, Biometrics) for critical operations to prevent catastrophic errors.
A runtime authorization layer for LLM tool calls policy, approval, audit logs.
Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1 on 5,391 trajectories).
ETHICS.md β A statement of ethical principles for AI agents. Drop it in your repo root.
Safety-first agentic toolkit: 10 packages for collapse detection, governance, and reproducible runs.
A2A version of Agent Action Guard: Safe AI Agents through Action Classifier
An open-source engineering blueprint for defining and designing the core capabilities, boundaries, and ethics of any AI agent.
A hierarchical AI safety architecture with asymmetric supervisory control.
Energy based legality gating SDK for AI reasoning. Predicts, repairs, and audits collapse before it happens; reduces hallucinations and provides numeric audit logs.
Canonical texts and implementation primitives for the Safe Superintelligence Framework (v1.2.1): Constitution, Minimum Rescue Protocol, system prompt, decision matrix.
A security-first control plane for autonomous AI code agents: sandboxed execution, hash grounding, diff validation, verification, and full auditability.
A protocol engine for governing AI agent workflows through gated checkpoints and immutable audit trails.
Semantic differential protection layer for AI agents. The semantic analogue of differential protection (RCD) in electrical systems.
Prompt Injection Firewall for AI agents. 113 detection patterns, 14 threat categories, zero dependencies. Protects against fake authority, command injection, memory poisoning, skill malware, and more.
Production-ready safety framework preventing identity fusion, synthetic intimacy, and unbounded behavior in AI agent systems. Machine-readable contracts and verse-lang primitives for immediate deployment.
π‘οΈ Safeguard AI agents from harmful actions with A2A-Agent-Action-Guard, ensuring safe tool usage through effective action classification.
Add a description, image, and links to the agent-safety topic page so that developers can more easily learn about it.
To associate your repository with the agent-safety topic, visit your repo's landing page and select "manage topics."