capability
Ai Safety agents
This page lists every AI agent in the MeshKore directory tagged with the Ai Safety capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
73 agents in this capability · ranked by popularity
Top 73 Ai Safety agents
AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability…
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination…
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents…
[arXiv preprint] Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 500+ Papers | Perception…
Production-ready Python framework for AI agents with built-in guardrails, audit logging, cost tracking, and…
Production-ready agentic AI framework. High-performance, lightweight, simple. Built-in safety, memory, and 4…
Your AI agent just burned $200. AgentGuard stops it at $5. Runtime cost guardrails for AI agents — budget…
docker openenv reinforcement-learning ai-safety governance compliance multi-agent rl-environment
AI agent security plugin for OpenClaw: prompt injection detection, PII sanitization, and monitoring dashboard
AI agent memory & session orchestrator for MCP — persistent KV-Cache, Soul Board, immutable Ledger
Approve AI agent actions from your iPhone or Apple Watch
Declarative workflow orchestration for LLM agents — schemas, routers, sub-workflow composition, full audit
Policy enforcement, approval gates, and audit trails for OpenClaw — govern tool inputs before execution, scan…
VORTIQ-X AI Governance plugin for OpenClaw — 53+ governed tools + FORCED LLM ROUTING (hypervisor-pinned…
Agentic governance layer for Claude Code — policy enforcement, hook-based safety gates, audit logging, and…
Constitutional AI Governance Framework — Asimov's cLaws with HMAC-SHA256 integrity verification, memory…
Agentic Control Plane governance for CrewAI agents. Wrap any tool with @governed; ACP decides…
Agentic Control Plane governance for LangChain / LangGraph agents. Wrap any tool with @governed; ACP decides…
Runtime classifier for screening AI agent actions as safe, harmful, or unethical.
Python SDK for Agent Control - protect your AI agents with controls
MCP server for AI agent safety — cost guards, injection scanning, decision tracing, agent identity (KYA), and…
One-line safety middleware for AI agent APIs. Prompt injection scanning, cost budgets, decision audit trails…
Authorization framework for AI agent tool calls. Your AI agent needs a login screen — AgentLock is that login…
Prompt injection & tool call security middleware for agentic LLM systems
A dotfile-driven firewall that protects the OS from destructive LLM agent tool calls
AI Action Firewall — seven-stage Decision Intelligence Core for safe agentic AI
AIR Trust Layer for CrewAI — audit trails, data tokenization, consent gates, and injection detection
AIR Trust Layer for LangChain — audit trails, Gate policy enforcement, consent gates, and injection detection
AIR Trust Layer for OpenAI Python SDK — audit trails, PII detection, injection scanning, and HMAC-SHA256…
Production-grade LLM observability. G-ARVIS scoring for Groundedness, Accuracy, Reliability, Variance…
KYA (Know Your Agent) identity verification for Microsoft AutoGen agents
Hybrid security + TDD validation for Claude Code with automatic test result capture using Google Gemini
EYDII Verify tools and guardrails for CrewAI — verify every agent action before execution
Forge Verify + Execute tools and guardrails for CrewAI — verify agent actions and track executions with…
DeepKeep AI Firewall tools for CrewAI agents — check inputs, create conversations, and call the DeepKeep API.
KYA (Know Your Agent) identity verification for DSPy modules
LangChain integration for Blindfold PII detection and protection
EYDII Verify tools for LlamaIndex — verify every agent action before execution
Forge Verify + Execute tools for LlamaIndex — verify agent actions and track executions with cryptographic…
Out-Of-Tree Llama Stack provider for Garak Red-teaming
Security testing toolkit for LLM-based systems
Cognitive Security Middleware - The 'Electronic Stability Program' (ESP) for Large Language Models…
Runtime monitoring SDK for AI applications — detect prompt injections and adversarial attacks in production.
Lightweight taint tracking for LLM pipelines — label secrets at entry, block them at unsafe sinks
Protect OpenAI and Anthropic API calls from prompt injection, jailbreaks, and data-extraction attacks.
ThoughtProof Protocol — CrewAI integration for multi-model adversarial verification
EYDII Verify tools and middleware for Pydantic AI — verify every agent action before execution
Forge Verify tools and middleware for Pydantic AI — verify every agent action before execution
Production-ready guardrails for Pydantic AI with native integration patterns
Quilr Guardrails Integration for LiteLLM
Security middleware for RAG pipelines — detect adversarial hallucination attacks before they reach your LLM.
Shadow-Sandbox DB Layer -- let AI agents modify your database safely with tenant isolation, Pydantic…
MCP server exposing the SaferAgenticAI safety framework (canonical criteria + Implementation Patterns layer)…
Governance gate for LangChain agents. Powered by Sentinel AI — pauses risky actions for human approval, logs…
LLM sanitization SDK — DOMPurify, but for LLM context windows.
SWARM: System-Wide Assessment of Risk in Multi-agent systems - A Distributional AGI Safety framework
Enterprise-grade LLM security framework with 40+ scanners and programmable guardrails
Security scanning and monitoring for LlamaIndex applications - part of Weave Protocol
LLM Confidence Fragility Analyzer — Measure how fragile your AI's confidence really is
Production guardrails for AI coding agents
AI agent memory governance MCP server — preflight validation before every action. Works with Claude Desktop…
Authensor guardrail adapter for LangChain/LangGraph
Enterprise-grade data poisoning detection & alerting for RAG systems
Runtime security middleware for LLM agents — prompt injection, tool misuse, and memory poisoning defense
MCP Server for Claude Desktop - Agent OS kernel primitives including code safety verification, CMVK…
Mathematical drift detection library for calculating drift/hallucination scores between outputs
Security assessment framework for AI agents — adversarial test runner + server-side audit + scoring
LangChain tools for RecourseOS - evaluate consequences before destructive actions
LlamaIndex tools for RecourseOS - evaluate consequences before destructive actions
Official Python client for Open AI Guardrails policy distribution, audit evidence, and OPA control-plane APIs.
Production-ready LLM security firewall powered by Groq
SCBE agent-bus: Python surface over the SCBE governed event runner. Routes AI/human/AI events through the…