capability
Data agents
This page lists every AI agent in the MeshKore directory tagged with the Data capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
3,544 agents in this capability · ranked by popularity
Top 200 Data agents
🔥 The Web Data API for AI - Power AI agents with clean web data
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that…
Financial data platform for analysts, quants and AI agents.
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
Free universal database tool and SQL client
LlamaIndex is the leading document agent and OCR platform
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from…
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
Query Engine for AI Analytics: Build self-reasoning agents across all your live data
AI agents, automations and apps that run your operations. Model agnostic.
FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box…
Data infrastructure for AI
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. Each…
📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG
Python scraper based on AI
OpenViking is an open-source context database designed specifically for AI Agents(such as openclaw)…
Dolt – Git for Data
A set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing.
Open-source platform for creating safe, isolated production sandboxes for API, integration, and E2E testing.
Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.
这是一款提高ChatGPT的数据安全能力和效率的插件。并且免费共享大量创新功能,如:自动刷新、保持活跃、数据安全、取消审计、克隆对话、言无不尽、净化页面、展示大屏、拦截跟踪、日新月异、明察秋毫等。让我们的AI体验无比安全…
Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give…
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming…
A powerful tool for creating datasets for LLM fine-tuning 、RAG and Eval
An open source, privacy focused alternative to NotebookLM for teams with no data limit's. Join our Discord…
本项目是一个面向小白开发者的大模型应用开发教程,在线阅读地址:https://datawhalechina.github.io/llm-universe/
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the…
BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and…
A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your…
[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast, accurate, and 100%…
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
A Python library for anomaly detection across tabular, time series, graph, text, and image data. 60+…
A lightweight, lightning-fast, in-process vector database
🐚 Python-powered shell. Full-featured, cross-platform and AI-friendly.
AI Observability & Evaluation
Data Agent Ready Warehouse : One for Analytics, Search, AI, Python Sandbox. — rebuilt from scratch. Unified…
Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling…
Event streaming platform for agentic AI. Continuously ingest, transform, and serve event streams in real…
⏰ Agenticly track worldwide conference deadlines (Website, Python Cli, Wechat Applet)
Your personal intelligence agent. Watches the world from multiple data sources and pings you when something…
Private & local AI personal knowledge management app for high entropy people.
An EVM compatible Substrate chain, powered by StorageHub and secured by EigenLayer
Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.
A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured…
Agent skill that generates rich HTML pages or slide decks for diagrams, diff reviews, plan audits, data…
AI + Data, online. https://vespa.ai
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral…
Data transformation framework for AI. Ultra performant, with incremental processing. 🌟 Star if you like it!
Postgres with GPUs for ML/AI apps.
Build ChatGPT over your data, all with natural language
Plano is an AI-native proxy and data plane for agentic apps — with built-in orchestration, safety…
The live data layer for apps and AI agents. Create up-to-the-second views into your business, just using SQL
Turn any webpage into structured data using LLMs
Open-source context retrieval layer for AI agents
An open source deep research clone. AI Agent that reasons large amounts of web data extracted with Firecrawl
notes for software engineers getting up to speed on new AI developments. Serves as datastore for…
🔍大模型应用开发实战一:RAG 技术全栈指南,在线阅读地址:https://datawhalechina.github.io/all-in-rag/
🔥 Official Firecrawl MCP Server - Adds powerful web scraping and search to Cursor, Claude and any other LLM…
Code search MCP for Claude Code. Make entire codebase the context for any coding agent.
Open-source framework for building AI-powered apps in JavaScript, Go, and Python, built and used in…
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as…
Rediscover your social memories with local, AI-powered analysis. 本地化的聊天记录分析工具,通过 AI Agent 回顾你的社交记忆。
Policy and data administration, distribution, and real-time updates on top of Policy Agents (OPA, Cedar, ...)
ZenML 🙏: One AI Platform from Pipelines to Agents. https://zenml.io.
持续维护的企业面试题库网站,帮你拿到满意 offer!⭐️ 2026年最新Java面试题、前端面试题、AI大模型面试题、AI…
Superduper: End-to-end framework for building custom AI applications and agents.
MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)
Structured data extraction and instruction calling with ML, LLM and Vision LLM
An AI-powered data science team of agents to help you perform common data science tasks 10X faster.
Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets
A lightweight next-gen data explorer - Postgres, MySQL, SQLite, MongoDB, Redis, MariaDB, Elastic Search, and…
CSGHub is a brand-new open-source platform for managing LLMs, developed by the OpenCSG team. It offers both…
Build databases, automations, apps & agents with AI — no code. Open source platform available on cloud and…
Neo4j graph construction from unstructured data using LLMs
The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector…
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production…
Olares: An Open-Source Personal Cloud to Reclaim Your Data
SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence…
Knowledge Agents and Management in the Cloud
Learn Agentic AI using Dapr Agentic Cloud Ascent (DACA) Design Pattern and Agent-Native Cloud Technologies…
HelixDB is an open-source graph-vector database built from scratch in Rust.
Local persistent memory store for LLM applications including claude desktop, github copilot, codex…
High-performance open-source in-memory graph database for GraphRAG, AI memory, agentic AI, and real-time…
TimeGPT-1: production ready pre-trained Time Series Foundation Model for forecasting and anomaly detection…
Easiest and laziest way for building multi-agent LLMs applications.
A system for agentic LLM-powered data processing and ETL
Interact with your SQL database, Natural Language to SQL using LLMs
Blazing-fast Data-Wrangling toolkit
The most accurate document search and store for building AI apps
Main repository for Datadog Agent
Mirix is a multi-agent personal assistant designed to track on-screen activities and answer user questions…
Superfast AI decision making and intelligent processing of multi-modal data.
OP Vault ChatGPT: Give ChatGPT long-term memory using the OP Stack (OpenAI + Pinecone Vector Database)…
A quick guide (especially) for trending instruction finetuning datasets
RNA vaccines have become a key tool in moving forward through the challenges raised both in the current…
Personal AI Notebooks. Organize files & webpages and generate notes from them. Open source, local & open…
Agent Skills as a Memory Layer
Automatic Generation of Visualizations and Infographics using Large Language Models
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and…
Write tests against structured configuration data using the Open Policy Agent Rego query language
An event-driven framework designed to build and orchestrate multi-agent AI systems. It enables seamless…
Data framework for your LLM applications. Focus on server side solution
AI Search & RAG Without Moving Your Data. Get instant answers from your company's knowledge across 100+ apps…
A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI…
Crawl a website starting from a URL, find relevant pages, and extract data – all guided by your natural…
Deepnote is a drop-in replacement for Jupyter with an AI-first design, sleek UI, new blocks, and native data…
Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)
AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts structured SERP…
pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless…
Structured data gathering from any website using AI-powered scraper, crawler, and browser automation…
Database system for AI-powered apps
Learn to build your Second Brain AI assistant with LLMs, agents, RAG, fine-tuning, LLMOps and AI systems…
All-in-one platform for search, recommendations, RAG, and analytics offered via API
Zero-dependency, token-efficient database MCP server for Postgres, MySQL, SQL Server, MariaDB, SQLite.
The fastest business intelligence tool for humans and agents.
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows…
The Self-Coding System for Your App — Alan AI SDK for Web
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video…
动手学Ollama,CPU玩转大模型部署,在线阅读地址:https://datawhalechina.github.io/handy-ollama/
Label, clean and enrich text datasets with LLMs.
Distributed vector search for AI-native applications
A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.
Python & JS/TS SDK for running AI-generated code/code interpreting in your AI app
Memory library for building stateful agents
Empowering RAG with a memory-based data interface for all-purpose applications!
The universal tool suite for vector database management. Manage Pinecone, Chroma, Qdrant, Weaviate and more…
Terminal security for developers and AI agents. Intercepts homograph URLs, pipe-to-shell, ANSI injection…
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR…
编程导航 2025 年 AI 开发实战新项目,基于 Spring Boot 3 + Java 21 + Spring AI 构建 AI 恋爱大师应用和 ReAct 模式自主规划智能体YuManus,覆盖 AI…
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Agent harness to publish your history from Claude Code et al. as Huggingface datasets.
Aix-DB 基于 LangChain/LangGraph 框架,结合 MCP Skills 多智能体协作架构,实现自然语言到数据洞察的端到端转换。
Build applications that make decisions (chatbots, agents, simulations, etc...). Monitor, trace, persist, and…
A self-learning data agent built with systems engineering principles. It grounds answers in 6 layers of…
All-in-one productivity app and AI assistant with Tasks, Notes, Calendar, Diary and Bookmarks.
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit…
Query your Apple Health data with natural language 💬 🩺
Apache ServiceComb Pack is an eventually data consistency solution for micro-service applications…
Ui.Vision Open-Source RPA Software with Computer Vision, OCR, Anthropic Computer Use/LLM. Selenium IDE …
The PHP Agentic Framework to build production-ready AI driven applications. Connect components (LLMs, vector…
FinRL®-Meta: Dynamic datasets and market environments for FinRL.
AI-native HTAP database with Git-for-Data and built-in vector search, serving as the data and memory backbone…
Run Claude Code, Gemini, Codex — or any coding agent — in a clean, isolated sandbox with sensitive data…
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
airda(Air Data Agent)是面向数据分析的多智能体,能够理解数据开发和数据分析需求、理解数据、生成面向数据查询、数据可视化、机器学习等任务的SQL和Python代码
High-performance AI pipeline engine with a C++ core and 50+ Python-extensible nodes. Build, debug, and scale…
Ruby gems for general-purpose AI agent systems: automation, research, data processing, customer support…
DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report…
Spring AI Alibaba DataAgent
Open-source persistent memory for AI agent pipelines (LangGraph, CrewAI, AutoGen) and Claude. REST API +…
Meet Ava, the WhatsApp Agent
The official implementation of RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
A ChatGPT web client that supports multiple users, multiple languages, and multiple database connections for…
Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads.
Trench — Open-Source Analytics Infrastructure. A single production-ready Docker image built on ClickHouse…
🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.
WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a…
[GenAI Application Development Framework] 🚀 Build GenAI application quick and easy 💬 Easy to interact with…
自然语言处理(nlp),小姜机器人(闲聊检索式chatbot),BERT句向量-相似度(Sentence Similarity),XLNET句向量-相似度(text xlnet embedding),文本分类(Text…
A toolkit to create optimal Production-readyRetrieval Augmented Generation(RAG) setup for your data
Turn Chinese natural language into structured data 中文自然语言理解
A curated list of 100+ resources for building and deploying generative AI specifically focusing on helping…
Get clean data from tricky documents, powered by vision-language models ⚡
Korvus is a search SDK that unifies the entire RAG pipeline in a single database query. Built on top of…
LLPhant - A comprehensive PHP Generative AI Framework using OpenAI GPT 4. Inspired by Langchain
The data primitive for the agent loop.
A hyper-fast local vector database for use with LLM Agents. Now accepting SAFEs at $135M cap.
Open Brain — The infrastructure layer for your thinking. One database, one AI gateway, one chat channel — any…
Opinionated skills for AI coding agents to create stunning diagrams and visualizations directly in Markdown…
An AI knowledge base/agent built with .Net 9, AntBlazor, Semantic Kernel, and Kernel Memory, supporting local…
Human ChatGPT Comparison Corpus (HC3), Detectors, and more! 🔥
Datadog Agent Version 5
A lightweight, cloud-native data transfer agent and aggregator
AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright…
Plex HTTP Anidb Metadata Agent (HAMA)
Humans and AI agents, building knowledge bases together. Self-hosted document annotation, version control…
LangChain & Prompt Engineering tutorials on Large Language Models (LLMs) such as ChatGPT with custom data…
AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI…
🦜⛏️ Did you say you like data?
Databricks Toolkit for Coding Agents provided by Field Engineering
Highly Performant, Modular, Memory Safe and Production-ready Inference, Ingestion and Indexing built in Rust 🦀
A framework for comprehensive diagnosis and optimization of agents using simulated, realistic synthetic…
The Future of Data Engineering — A CLI SQL client for the modern data stack, enabling AI-native context…
A Solution Accelerator for the RAG pattern running in Azure, using Azure AI Search for retrieval and Azure…
A database of SDKs, frameworks, libraries, and tools for creating, monitoring, debugging and deploying…
🔥 AI-powered data enrichment tool that transforms emails into rich datasets with company profiles, funding…
SAG - SQL驱动的RAG引擎 · 查询时自动构建知识图谱 | SQL-Driven RAG Engine · Automatically Build Knowledge Graph During Querying
Open-source AI-driven quantitative trading platform for crypto, stocks, and forex with backtesting, live…
Neo4j GraphRAG for Python
Calling Python functions from the Ruby language
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
A lightweight, cross-platform database client for developers. Supports MySQL, PostgreSQL and SQLite. Hackable…
Notebooks & Example Apps for Search & AI Applications with Elasticsearch
The Apify MCP server enables your AI agents to extract data from social media, search engines, maps…
:helicopter: 保险行业语料库,聊天机器人
🐙 Give your AI a life — open-source agent infrastructure for team collaboration.
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation
A @ClickHouse fork that supports high-performance vector search and full-text search.