A curated list of tools, frameworks, standards, and resources for governing autonomous AI agents, covering safety, trust, identity, observability, and compliance across the agent lifecycle.
AI agents now hold real reach: email, CRMs, databases, financial systems. Guardrails at the content layer probably hold, but enterprises need to run on proof, not probability. This list tracks the tools and practices for making agents safe, auditable, and trustworthy in production.
- Governance Frameworks
- End-to-End Governance: Software and Hardware
- Policy as Code
- LLM Safety & Guardrails
- Agent Frameworks with Governance Features
- Agent Identity & Attestation
- Observability & Monitoring
- Security Testing
- Standards & Specifications
- Research Papers
- Industry Reports & Guidance
- Conferences & Communities
Dedicated platforms and control planes for governing AI agent behavior, enforcing policies, and maintaining trust at runtime.
- Agent Governance Toolkit (AGT) - Production governance layer for autonomous agents with a policy enforcement kernel (<0.1ms p99), execution rings (Ring 0-3), cryptographic Merkle audit logs, and integrations across LangChain, CrewAI, AutoGen, Google ADK, and more. Python + .NET + Rust. Now stewarded by the Agentic AI Foundation. Provides the software governance layer that integrates with hardware-attested enforcement via cMCP. β 4000+
- Agent Hypervisor - Runtime supervisor for AI agents with execution rings, saga compensation, joint liability, and kill-switch controls.
- Agent OS - Kernel-level governance with POSIX-inspired primitives, policy interception, and OWASP Agentic Top 10 coverage. Integrates with LangChain, CrewAI, AutoGen, PydanticAI, and smolagents.
- Agent SRE - Site reliability engineering for AI agents: SLOs, error budgets, chaos testing, circuit breakers, and cascading failure detection.
- AgentGate - Open-source pre-execution authorization PDP for autonomous AI agents. Scores trust across 4 dimensions (Identity, Delegation, Purpose Alignment, Behavioral), detects 24h kill chain patterns, and generates Merkle-chained audit trails before each action executes. MIT licensed, drop-in with LangGraph and LangChain.
- AgentMesh - Zero-trust networking for AI agents with DID identity, behavioral trust scoring, delegation chains, and MCP governance proxy.
- Coral Server - Agent coordination and trust server enabling safe multi-agent collaboration with structured communication protocols.
- Cordum - Agent control plane providing governance, lifecycle management, and policy enforcement for autonomous agents.
- Gate22 - MCP gateway with role-based access control, audit logging, and fine-grained permission management for tool access.
- IBM mcp-context-forge - Enterprise MCP gateway with context-aware guardrails, request routing, and compliance controls.
- Invariant Guardrails - Rule-based guardrails engine with policy-as-code, trace analysis, and real-time intervention for agentic applications.
- LiteLLM - Unified LLM gateway with spend tracking, rate limiting, guardrails, and access controls across 100+ LLM providers.
- Regulus - EU & UK compliance plane for Google ADK encoding 10 regulations (EU AI Act, GDPR, DORA, NIS2, EHDS, UK GDPR, FCA SYSC, PRA SS1/23, PRA SS2/21, NHS DSPT) and 6 governance frameworks as runtime ADK
BasePluginprofiles; emits hash-chained audit envelopes with GRC adapters (ServiceNow IRM, OneTrust, MetricStream). - ScopeBlind protect-mcp - Security gateway for MCP servers with Cedar policy enforcement (AWS Cedar via WASM), Ed25519-signed decision receipts, issuer-blind spending authority (VOPRF), and multi-agent swarm tracking. Merged into AGT.
- TrinityGuard - Multi-agent safety framework with three-layer defense for detecting and preventing unsafe agent behaviors.
Tools and platforms that combine software policy enforcement with hardware-attested execution. The software layer (Cedar policy, audit logs, agent identity) is measured into a Trusted Execution Environment so that the governance claims cannot be forged by a privileged operator, a compromised runtime, or a supply chain attack. The result is a compliance artifact that any third party can verify offline without trusting the operator.
The stack: AGT enforces Cedar policies at the software layer. cMCP carries those policies into the TEE. TRACE records the attested outcome. Agent Manifest binds all ten deployment artifacts into a hardware-sealed identity document. Opaque Systems provides the managed TEE runtime.
- Agent Manifest - Signs all 10 agent deployment artifacts (system prompt, policy bundle, model identity, tool schemas, RAG corpus, memory baseline, decision trace, A2A delegation chain, build provenance, HITL approvals) into a single tamper-evident record. Hardware attestation via TPM, AMD SEV-SNP, Intel TDX, or Opaque Managed Runtime. Four conformance levels mapped to EU AI Act Art. 13-15, DORA, HIPAA, and FedRAMP. 197 conformance tests. Python + TypeScript. Developer preview, launching June 23 2026.
- cMCP (Confidential MCP Gateway) - MCP gateway running inside a TEE. The Cedar policy bundle is measured into hardware attestation before any code runs; the signing key never leaves the enclave. Every tool call produces a signed GatewayClaim bound to the hardware measurement. Supports TPM, AMD SEV-SNP, Intel TDX, NVIDIA H100 Confidential Compute, and Opaque Managed Runtime. Developer preview, launching June 23 2026.
- Opaque Systems - Managed TEE runtime providing the highest-assurance hardware backend for cMCP, TRACE, and Agent Manifest. Handles TEE provisioning, attestation key lifecycle, and enclave deployment. Produces TRACE records signed by Opaque's hardware root of trust. Commercial.
- TRACE (Trust Runtime Attestation and Compliance Evidence) - Open attestation standard and Python SDK for agentic AI governance. Each agent run produces a signed EAT/JWT trust record binding model identity, policy version, TEE hardware evidence, and tool call transcript into a single artifact verifiable offline without calling the operator. Built on IETF RATS (RFC 9334), EAT (RFC 9711), SCITT, SLSA, and SPIFFE. Spec v0.1, launching June 23 2026.
Language-level tools for expressing, validating, and enforcing authorization policies applicable to agent capability bounds, tool access, and data permissions.
- Casbin - Cross-language authorization library supporting ACL, RBAC, and ABAC models. Available in Go, Python, Java, and more.
- Cedar - Amazon's policy language for fine-grained, type-safe access control. Used as the policy engine in the Agent Governance Toolkit. Fast, formally verified, and human-readable.
- Open Policy Agent (OPA) - CNCF general-purpose policy engine. Decouples policy decisions from application logic using the Rego language. Widely deployed for Kubernetes and API authorization.
- SpiceDB - Google Zanzibar-inspired database for fine-grained, relationship-based authorization. Useful for cross-agent and multi-tenant permission modeling.
Input/output filtering, content safety, and prompt protection for LLM-powered agents.
- ai-evaluation - Open-source LLM evaluation framework with 50+ metrics, LLM-as-Judge, and guardrail scanners (jailbreak, PII, injection).
- Arthur Shield - Firewall for LLMs that detects hallucinations, toxicity, PII leakage, and prompt injection in real time.
- Guardrails AI - Framework for structural, type, and quality guarantees on LLM outputs. Guardrails Hub provides community validators.
- Hyperion - Framework for evaluating and improving robustness of LLM-based agents against adversarial attacks.
- Lakera Guard - Real-time API for detecting prompt injections, data leakage, toxic content, and other LLM security threats.
- LLM Guard - Input and output scanners covering toxicity, PII, prompt injection, invisible text, and code detection.
- Meta Llama Guard - Safety classifier models for filtering unsafe LLM inputs and outputs. Part of Meta's Purple Llama safety suite.
- NVIDIA NeMo Guardrails - Open-source toolkit for adding programmable guardrails to LLM-based conversational systems using Colang.
- Rebuff - Prompt injection detection using multi-layer defense: heuristics, LLM analysis, and canary tokens.
- Vigil - LLM security scanner for detecting prompt injections using embedding similarity, heuristics, and canary tokens.
Agent development frameworks that include governance, safety, or policy hooks.
- AgentScope - Multi-agent platform with fault tolerance, agent-level monitoring, and configurable message validation.
- AutoGen - Multi-agent conversation framework with human oversight, code execution sandboxing, and conversation policies.
- CrewAI - Multi-agent orchestration with role-based agents, task delegation, and configurable guardrails.
- Dify - LLM app development platform with content moderation, rate limiting, and annotation logging.
- Google Agent Development Kit (ADK) - Google's agent framework with safety callbacks, evaluation tools, and multi-agent session management.
- Haystack - End-to-end NLP framework with pipeline-based architecture supporting content filtering and validation.
- LangChain - Composable framework with callbacks, tracing, and moderation chains for LLM applications.
- LangGraph - Stateful, multi-actor agent framework with human-in-the-loop and persistence built in.
- LlamaIndex - Data framework with observability callbacks, evaluation modules, and structured output guarantees.
- OpenAI Agents SDK - Official OpenAI SDK with built-in guardrails, input/output validation, and handoff controls.
- PydanticAI - Agent framework with type-safe tool definitions, structured outputs, and dependency injection.
- Semantic Kernel - Microsoft's AI orchestration SDK with plugin permission models, function filtering, and responsible AI hooks.
- smolagents - Hugging Face's lightweight agent library with sandboxed code execution and security controls.
Protocols and tools for establishing cryptographic identity, trust, and verifiable provenance for AI agents. For hardware-attested agent identity and compliance records, see End-to-End Governance: Software and Hardware.
- Agent Card / AI Card - Specification for machine-readable agent capability and policy metadata, enabling discovery and trust decisions.
- Nobulex - Bilateral receipt primitive for tamper-evident agent audit trails: two Ed25519 signatures per action (pre- and post-execution), hash-chained via JCS canonicalization (RFC 8785). The receipt-signing approach is merged into AGT. MIT licensed.
- SPIFFE/SVID - Secure Production Identity Framework for Everyone. Cryptographic workload identity applicable to agent-to-agent authentication.
- TWZRD Agent Intel - On-chain trust scoring and reputation verification for AI agent wallets on Solana. Queries verifiable on-chain wallet history to authorize agent-to-agent actions before execution; issues signed, x402-gated trust receipts for immutable audit trails.
- W3C Decentralized Identifiers (DIDs) - W3C standard for decentralized, self-sovereign identifiers applicable to durable agent identity without centralized registries.
Platforms for tracing, monitoring, evaluating, and debugging AI agent behavior.
- AgentOps - Agent observability SDK with session replay, LLM cost tracking, compliance monitoring, and failure detection.
- Arize / Phoenix - Open-source AI observability with LLM tracing, evaluation, retrieval analysis, and experiment tracking.
- Braintrust - Evaluation and monitoring platform with logging, scoring, and experiment comparison.
- Datadog LLM Observability - Enterprise monitoring with trace clustering, cost attribution, and quality scoring.
- Future AGI - Open-source self-hostable end-to-end agent engineering platform with tracing, evals, guardrails, and gateway.
- Helicone - Open-source LLM observability with request logging, cost tracking, caching, and rate limiting.
- Jaeger - Open-source distributed tracing for monitoring agent workflows and debugging latency across services.
- Langfuse - Open-source LLM engineering platform with tracing, prompt management, evaluations, and cost tracking.
- LangSmith - LangChain's platform for debugging, testing, evaluating, and monitoring LLM applications.
- MLflow - Open-source ML lifecycle platform with experiment tracking, model registry, and LLM evaluation tools.
- Prometheus + Grafana - Industry-standard metrics and visualization. Foundation for custom agent SLO dashboards.
- traceAI - Open-source OpenTelemetry-native tracing for LLM and agent apps with 50+ framework integrations.
- Weights & Biases - ML experiment tracking with LLM tracing, evaluation pipelines, and model monitoring.
Scanners, red-teaming tools, and frameworks for testing the security of AI agents and LLMs.
- Counterfit - Azure's tool for assessing ML model security through adversarial attacks.
- CyberSecEval - Meta's benchmark suite for LLM cybersecurity risks including insecure code generation and prompt extraction.
- Garak - LLM vulnerability scanner from NVIDIA. Probes for hallucination, data leakage, prompt injection, toxicity, and more.
- HouYi - Prompt injection attack framework for testing LLM-integrated application security boundaries.
- PyRIT - Microsoft's Python Risk Identification Toolkit for red-teaming generative AI systems with automated attack strategies.
- Snyk Agent Scan - Security scanner for AI agents and MCP servers. Detects vulnerabilities in tool configurations, permissions, and data flows.
Protocols, specifications, and regulatory frameworks relevant to agent governance and interoperability.
- Agent-to-Agent Protocol (A2A) - Google-led open protocol for inter-agent communication, task delegation, and capability discovery.
- CoSAI Risk Map v1 - Coalition for Secure AI's component-level risk taxonomy for agentic AI systems: 23 components, 35 controls, 36 risks (including 6 agentic-specific), 10 personas, 8 lifecycle stages. Cross-walks to MITRE ATLAS, NIST AI RMF, STRIDE, OWASP LLM Top 10, ISO 22989, and EU AI Act. Launched June 2026.
- CSA Agentic Trust Framework - Cloud Security Alliance / MassiveScale framework defining 5 Core Elements (Identity, Behavior, Data Governance, Segmentation, Incident Response) with 25 requirements across a 4-tier maturity model (Intern, Junior, Senior, Principal). Public Review Draft v0.9.1 (April 2026). Two conformance tiers: ATF Compatible / ATF Certified.
- CSA MCP Security Resource Center - Cloud Security Alliance community project for securing MCP servers and AI agents: hardening guides, audit database, and vulnerability database.
- CSA Non-Human Identity & Agentic AI Governance v1 - Cloud Security Alliance whitepaper recommending a 6-field NHI registry (Identity, Owning Team, Business Purpose, Systems Accessed, Privilege Scope, Expiration/Review Date) for managing non-human and agentic identities. Published May 2026.
- EU AI Act - EU regulation classifying AI systems by risk with requirements for transparency, human oversight, and governance.
- Model Context Protocol (MCP) - Open protocol connecting LLMs to tools and data sources with standardized server/client architecture.
- NIST AI 600-1 (GenAI Profile) - NIST profile of the AI Risk Management Framework focused on generative AI, enumerating 12 risk categories (CBRN, Confabulation, Dangerous Content, Data Privacy, Environmental Impacts, Harmful Bias, Human-AI Configuration, Information Integrity, Information Security, Intellectual Property, Obscene Content, Value Chain). 200+ suggested actions across Govern/Map/Measure/Manage functions. Published July 2024.
- NIST AI Risk Management Framework - NIST framework for managing AI risk including governance and accountability.
- NIST IR 8596 (Cyber AI Profile) - NIST profile crossing the Cybersecurity Framework 2.0 against AI in three focus areas (Secure AI System Components, Defend with AI, Thwart AI-Enabled Threats) across six CSF functions (Govern, Identify, Protect, Detect, Respond, Recover). Preliminary Draft (December 2025).
- OpenTelemetry - Vendor-neutral observability standard for traces, metrics, and logs. Foundation for agent observability pipelines.
- Oracle Agent Spec - Enterprise agent interoperability specification with governance and management capabilities.
- OWASP Agentic Applications Top 10 (2026) - OWASP classification of the top 10 security risks in agentic AI: excessive agency, trust boundary failures, identity spoofing, and more.
- OWASP AI Security Verification Standard (AISVS) - OWASP verification-controls standard with 14 chapters covering training data, input validation, model lifecycle, infrastructure, access control, supply chain, model behavior, memory/embeddings, autonomous orchestration (C9), MCP security (C10), adversarial robustness, privacy, monitoring, and human oversight. Releases v1.0 at OWASP Global AppSec EU Vienna on 24 June 2026.
- OWASP Non-Human Identities (NHI) Top 10 (2025) - OWASP awareness list of the top 10 risks for non-human identities (Improper Offboarding, Secret Leakage, Vulnerable Third-Party NHI, Insecure Authentication, Overprivileged NHI, Insecure Cloud Deployment Configurations, Long-Lived Secrets, Environment Isolation, NHI Reuse, Human Use of NHI). Risk-ranked using OWASP Risk Rating Methodology. Published December 2024.
- Signed Decision Receipts (IETF) - Internet-Draft defining a portable, cryptographically signed receipt format for machine-to-machine access control decisions. Ed25519 + JCS canonicalization. Independently verifiable offline.
- Singapore IMDA Model AI Governance Framework for Agentic AI - Singapore IMDA framework defining eight components of an agentic system (Model, Instructions, Memory, Planning & Reasoning, Tools, Protocols, Controls, Logging/Monitoring) and four governance dimensions (bounding risks, human accountability, technical controls, end-user responsibility). v1.5 published 20 May 2026
- SPDX 3.0.1 AI Profile - SPDX specification for AI Bill of Materials (AIBOM) with AIPackage and DatasetPackage classes covering automation level, autonomy type, hyperparameters, model explainability, safety risk assessment, anonymization method, dataset size, and sensitive personal information flags. Stable since December 2024 (3.0.1); 3.1-RC1 pre-release adds EU AI Act-aligned dataset size properties (token count, item count, content duration).
Key academic works on agent safety, multi-agent governance, and trust in AI systems.
- A Survey on Large Language Model based Autonomous Agents - Comprehensive survey of LLM-based agents covering architecture, capabilities, and safety.
- Agent Safety: An Emerging Research Direction - Analysis of safety challenges unique to autonomous AI agents beyond traditional LLM safety.
- Constitutional AI: Harmlessness from AI Feedback - Anthropic's methodology for training AI systems to be helpful, harmless, and honest using AI feedback.
- Multi-Agent Safety: A Systematic Survey - Survey of safety challenges in multi-agent systems including coordination failures and emergent behaviors.
- Practices for Governing Agentic AI Systems - OpenAI's whitepaper on governance practices: oversight, monitoring, and containment.
- Reflexion: Language Agents with Verbal Reinforcement Learning - Self-reflective agent architecture with implications for building self-correcting, safer agents.
- The Landscape of Emerging AI Agent Architectures - Survey of agentic architectures with analysis of governance-relevant design patterns.
- The Rise and Potential of Large Language Model Based Agents: A Survey - Survey covering agent construction, applications, and societal implications including safety.
- Toolformer: Language Models Can Teach Themselves to Use Tools - Foundational work on tool-using LLMs, relevant to understanding tool governance requirements.
- Towards Autonomous AI Agents: A Safety-First Approach - Framework for integrating safety constraints into autonomous agent design from the ground up.
Practitioner guides, threat models, and industry analyses for agent governance.
- Anthropic's Responsible Scaling Policy - Framework for responsible AI deployment with AI Safety Levels (ASL) and safety evaluation commitments.
- Anthropic: Trustworthy Agents in Practice - Anthropic's practical guidance on deploying trustworthy agents: balancing autonomy with human oversight, prompt-injection resistance, and operational safeguards.
- CSA AI Safety Initiative - Cloud Security Alliance publications on AI safety and security best practices.
- Google Secure AI Framework (SAIF) - Conceptual framework for securing AI systems across the development and deployment lifecycle.
- Microsoft Responsible AI Standard - Framework covering fairness, reliability, safety, and transparency in AI development.
- MITRE ATLAS - Adversarial Threat Landscape for AI Systems. Adversary tactics and techniques against AI/ML systems, structured like ATT&CK.
- OWASP Agentic AI Threats and Mitigations - OWASP's threat catalog and mitigation strategies for agentic applications.
- OWASP Top 10 for LLM Applications - Top 10 security risks for LLM applications including prompt injection, insecure output handling, and supply chain vulnerabilities.
Key venues for agent governance research, practice, and community discussion.
- AAMAS - International Conference on Autonomous Agents and Multi-Agent Systems.
- ACM FAccT - ACM Conference on Fairness, Accountability, and Transparency in sociotechnical systems.
- AI Safety Camp - Research program for AI safety and alignment.
- Alignment Forum - Community forum for AI alignment and safety research.
- Confidential Computing Summit - Annual conference on confidential computing, TEE hardware, and hardware-attested workloads. Primary venue for TRACE, cMCP, and Agent Manifest launch (June 2026, San Francisco).
- ICML - International Conference on Machine Learning, with safe and reliable ML tracks.
- NeurIPS - Leading ML conference with AI safety, alignment, and trustworthy ML workshops.
- OWASP GenAI - OWASP community for security guidance on generative AI and agentic applications.
- r/agentic - Reddit community for autonomous AI agents, tools, and governance.
Contributions welcome! Please read the contribution guidelines and submit a PR. Open-source governance tools, research papers, and community resources are especially welcome.
Join the community on Discord.
Maintained by Imran Siddique, CPO at Opaque Systems, creator of the Agent Governance Toolkit, and contributor to OWASP ASI, CoSAI WS4, and the Agentic AI Foundation. The curator also maintains AGT, the agentrust-io tools (TRACE, cMCP, Agent Manifest), and the four component repos (Agent OS, AgentMesh, Agent SRE, Agent Hypervisor).