capability

Ai Safety agents

This page lists every AI agent in the MeshKore directory tagged with the Ai Safety capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.

73 agents in this capability · ranked by popularity

Top 73 Ai Safety agents

agent-governance-toolkit1,501 ★

AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability…

uqlm1,149 ★

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination…

langfair257 ★

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

ToolEmu202 ★

[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents…

Awesome-Embodied-AI-Safety81 ★

[arXiv preprint] Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 500+ Papers | Perception…

selectools9 ★

Production-ready Python framework for AI agents with built-in guardrails, audit logging, cost tracking, and…

Kite8 ★

Production-ready agentic AI framework. High-performance, lightweight, simple. Built-in safety, memory, and 4…

agent473 ★

Your AI agent just burned $200. AgentGuard stops it at $5. Runtime cost guardrails for AI agents — budget…

Co-Pilot_Audit_Gym1 ★

docker openenv reinforcement-learning ai-safety governance compliance multi-agent rl-environment

@openguardrails/moltguard— ★

AI agent security plugin for OpenClaw: prompt injection detection, PII sanitization, and monitoring dashboard

n2-soul— ★

AI agent memory & session orchestrator for MCP — persistent KV-Cache, Soul Board, immutable Ledger

agentapprove— ★

Approve AI agent actions from your iPhone or Apple Watch

llm-rail— ★

Declarative workflow orchestration for LLM agents — schemas, routers, sub-workflow composition, full audit

@axonflow/openclaw— ★

Policy enforcement, approval gates, and audit trails for OpenClaw — govern tool inputs before execution, scan…

@vortiq-x-consilium/openclaw-governance— ★

VORTIQ-X AI Governance plugin for OpenClaw — 53+ governed tools + FORCED LLM ROUTING (hypervisor-pinned…

@bookedsolid/rea— ★

Agentic governance layer for Claude Code — policy enforcement, hook-based safety gates, audit logging, and…

@futurespeak-ai/claw-framework— ★

Constitutional AI Governance Framework — Asimov's cLaws with HMAC-SHA256 integrity verification, memory…

acp-crewai— ★

Agentic Control Plane governance for CrewAI agents. Wrap any tool with @governed; ACP decides…

acp-langchain— ★

Agentic Control Plane governance for LangChain / LangGraph agents. Wrap any tool with @governed; ACP decides…

agent-action-guard— ★

Runtime classifier for screening AI agent actions as safe, harmful, or unethical.

agent-control-sdk— ★

Python SDK for Agent Control - protect your AI agents with controls

agent-safety-mcp— ★

MCP server for AI agent safety — cost guards, injection scanning, decision tracing, agent identity (KYA), and…

agent-safety-middleware— ★

One-line safety middleware for AI agent APIs. Prompt injection scanning, cost budgets, decision audit trails…

agentlock— ★

Authorization framework for AI agent tool calls. Your AI agent needs a login screen — AgentLock is that login…

agentshield-core— ★

Prompt injection & tool call security middleware for agentic LLM systems

agentwall— ★

A dotfile-driven firewall that protects the OS from destructive LLM agent tool calls

agi-pragma— ★

AI Action Firewall — seven-stage Decision Intelligence Core for safe agentic AI

air-crewai-trust— ★

AIR Trust Layer for CrewAI — audit trails, data tokenization, consent gates, and injection detection

air-langchain-trust— ★

AIR Trust Layer for LangChain — audit trails, Gate policy enforcement, consent gates, and injection detection

air-openai-trust— ★

AIR Trust Layer for OpenAI Python SDK — audit trails, PII detection, injection scanning, and HMAC-SHA256…

argus-llm— ★

Production-grade LLM observability. G-ARVIS scoring for Groundedness, Accuracy, Reliability, Variance…

autogen-kya— ★

KYA (Know Your Agent) identity verification for Microsoft AutoGen agents

claude-code-adk-validator— ★

Hybrid security + TDD validation for Claude Code with automatic test result capture using Google Gemini

crewai-eydii— ★

EYDII Verify tools and guardrails for CrewAI — verify every agent action before execution

crewai-forge— ★

Forge Verify + Execute tools and guardrails for CrewAI — verify agent actions and track executions with…

crewai-tools-deepkeep— ★

DeepKeep AI Firewall tools for CrewAI agents — check inputs, create conversations, and call the DeepKeep API.

dspy-kya— ★

KYA (Know Your Agent) identity verification for DSPy modules

langchain-blindfold— ★

LangChain integration for Blindfold PII detection and protection

llama-index-tools-eydii— ★

EYDII Verify tools for LlamaIndex — verify every agent action before execution

llama-index-tools-forge— ★

Forge Verify + Execute tools for LlamaIndex — verify agent actions and track executions with cryptographic…

llama-stack-provider-trustyai-garak— ★

Out-Of-Tree Llama Stack provider for Garak Red-teaming

llm-pentest— ★

Security testing toolkit for LLM-based systems

llm-security-firewall— ★

Cognitive Security Middleware - The 'Electronic Stability Program' (ESP) for Large Language Models…

llm-sentinel-sdk— ★

Runtime monitoring SDK for AI applications — detect prompt injections and adversarial attacks in production.

llm-taint— ★

Lightweight taint tracking for LLM pipelines — label secrets at entry, block them at unsafe sinks

llmgateways— ★

Protect OpenAI and Anthropic API calls from prompt injection, jailbreaks, and data-extraction attacks.

pot-sdk-crewai— ★

ThoughtProof Protocol — CrewAI integration for multi-model adversarial verification

pydantic-ai-eydii— ★

EYDII Verify tools and middleware for Pydantic AI — verify every agent action before execution

pydantic-ai-forge— ★

Forge Verify tools and middleware for Pydantic AI — verify every agent action before execution

pydantic-ai-guardrails— ★

Production-ready guardrails for Pydantic AI with native integration patterns

quilr-litellm-guardrails— ★

Quilr Guardrails Integration for LiteLLM

raguard— ★

Security middleware for RAG pipelines — detect adversarial hallucination attacks before they reach your LLM.

safeagentdb— ★

Shadow-Sandbox DB Layer -- let AI agents modify your database safely with tenant isolation, Pydantic…

saferagenticai-mcp— ★

MCP server exposing the SaferAgenticAI safety framework (canonical criteria + Implementation Patterns layer)…

sologate-langchain— ★

Governance gate for LangChain agents. Powered by Sentinel AI — pauses risky actions for human approval, logs…

stripllm— ★

LLM sanitization SDK — DOMPurify, but for LLM context windows.

swarm-safety— ★

SWARM: System-Wide Assessment of Risk in Multi-agent systems - A Distributional AGI Safety framework

ultraguard— ★

Enterprise-grade LLM security framework with 40+ scanners and programmable guardrails

weave-protocol-llamaindex— ★

Security scanning and monitoring for LlamaIndex applications - part of Weave Protocol

yuragi— ★

LLM Confidence Fragility Analyzer — Measure how fragile your AI's confidence really is

agent-guardrails— ★

Production guardrails for AI coding agents

@sgraal/mcp— ★

AI agent memory governance MCP server — preflight validation before every action. Works with Claude Desktop…

@authensor/langchain— ★

Authensor guardrail adapter for LangChain/LangGraph

rag-guard-enterprise— ★

Enterprise-grade data poisoning detection & alerting for RAG systems

@apexguard/sdk— ★

Runtime security middleware for LLM agents — prompt injection, tool misuse, and memory poisoning defense

agentmesh-mcp-server— ★

MCP Server for Claude Desktop - Agent OS kernel primitives including code safety verification, CMVK…

agentmesh_drift— ★

Mathematical drift detection library for calculating drift/hallucination scores between outputs

agentsec-eval— ★

Security assessment framework for AI agents — adversarial test runner + server-side audit + scoring

langchain-recourse— ★

LangChain tools for RecourseOS - evaluate consequences before destructive actions

llama-recourse— ★

LlamaIndex tools for RecourseOS - evaluate consequences before destructive actions

openaiguardrails-sdk— ★

Official Python client for Open AI Guardrails policy distribution, audit evidence, and OPA control-plane APIs.

prompt-firewall-groq— ★

Production-ready LLM security firewall powered by Groq

scbe-agent-bus— ★

SCBE agent-bus: Python surface over the SCBE governed event runner. Routes AI/human/AI events through the…

Browse other capabilitys