category
Image & Vision agents
This page lists every AI agent in the MeshKore directory tagged with the Image & Vision category. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
1,695 agents in this category · ranked by popularity
Top 200 Image & Vision agents
The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with…
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Open-source components, blocks, and AI agents designed to speed up your workflow. Import them seamlessly into…
This repo is meant to serve as a guide for Machine Learning/AI technical interviews.
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated…
Deep Learning and Reinforcement Learning Library for Scientists and Engineers
Replace port numbers with stable, named local URLs. For humans and agents.
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate…
🌻 一键拥有你自己的 ChatGPT+众多AI 网页服务 | One click access to your own ChatGPT+Many AI web services
AI Product Design Agent - Open Source
The agent-native LLM router for OpenClaw. 41+ models, <1ms routing, USDC payments on Base & Solana via x402.
🐬DeepChat - A smart assistant that connects powerful AI to your personal world
【🔞🔞🔞 内含不适合未成年人阅读的图片】基于我擅长的编程、绘画、写作展开的 AI 探索和总结:StableDiffusion 是一种强大的图像生成模型,能够通过对一张图片进行演化来生成新的图片。ChatGPT…
An AI-powered custom node for ComfyUI designed to enhance workflow automation and provide intelligent…
Kode Agent — Design for post-human workflows. One unit agent for every human & computer task.
The LLM's practical guide: From the fundamentals to deploying advanced LLM and RAG apps to AWS using LLMOps…
AI generates natively editable PPTX from any document — real PowerPoint shapes, not images — no design skills…
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
The UI design language and React library for Conversational UI
Riona Ai Agent 🌸 is built using Node.js and TypeScript 🛠️, designed for seamless job execution 📸. It's…
end to end app store screenshot creation using AI
ChatGPT + DALL-E + WhatsApp = AI Assistant :rocket: :robot:
谷歌新书Agent设计模式(agentic design patterns)最佳中文版,持续优化。附:在线阅读、pdf和epub电子书下载。
【三年面试五年模拟】AIGC算法工程师面试秘籍。涵盖AIGC、LLM大模型、AI…
🦖 𝗟𝗲𝗮𝗿𝗻 about 𝗟𝗟𝗠𝘀, 𝗟𝗟𝗠𝗢𝗽𝘀, and 𝘃𝗲𝗰𝘁𝗼𝗿 𝗗𝗕𝘀 for free by designing, training, and deploying a real-time…
Generate, animate and schedule your AI characters 🤖
The visual feedback tool for agents.
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as…
🌊 AChat - An open-source/self-hosted/local-first AI platform, designed for enterprises and teams, perfectly…
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it…
LSP-AI is an open-source language server that serves as a backend for AI-powered functionality, designed to…
Implementation of 17+ agentic architectures designed for practical use across different stages of AI system…
🌊 A Human-in-the-Loop workflow for creating HD images from text
A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization…
A playground to generate images from any text prompt using Stable Diffusion (past: using DALL-E Mini)
A secure persistent personal agent server in Rust. One binary, sandboxed execution, multi-provider LLMs…
Generate images by NovelAI | 基于 NovelAI 的画图机器人
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Supercharged experience for multiple models such as ChatGPT, DALL-E and Stable Diffusion.
PyBullet Gymnasium environments for single and multi-agent reinforcement learning of quadcopter control
🍭 Lobe UI - an open-source UI component library for building AIGC web apps
Free and Open-Source, Easy-to-Use Laravel eCommerce Platform, Base on the Laravel . It supports multiple…
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok…
🤖📐专为数学建模设计的 Agent ,自动完成数学建模,生成一份完整的可以直接提交的论文。 An Agent Designed for Mathematical Modeling ,Automatically…
AI Agent 驱动的开源视频生成工作台 — 小说→角色/场景/道具设计→剧本→分镜图→视频,跨镜头角色与场景一致 | Open-source AI video workspace powered by AI…
Generate images from texts. In Russian
Agentic Design Patterns
supports Telegram, Discord, Slack, Lark(飞书),钉钉, 企业微信, QQ, 微信, compatible with various LLMs including OpenAI…
[ICLR 2025] Automated Design of Agentic Systems
🚀 LangGraph for Java. A library for develop AI Agentic Architectures in the Java ecosystem. Designed to work…
DingTalk Workspace is an officially open-sourced cross-platform CLI tool from DingTalk. It unifies DingTalk’s…
bsuite is a collection of carefully-designed experiments that investigate core capabilities of a…
A simple yet powerful agent framework for personal assistants, designed to enable intelligent interaction…
ROSA 🤖 is an AI Agent designed to interact with ROS1- and ROS2-based robotics systems using natural language…
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and…
This course is designed to guide beginners through the exciting world of Edge AI, covering fundamental…
[EMNLP 2025 Oral] MemoryOS is designed to provide a memory operating system for personalized AI agents.
The TypeScript library for building AI applications.
It's not AI that takes away your job, but the people who master the use of AI tools. The most deadly attack…
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
Application implementation with business use cases for safely utilizing generative AI in business operations
Easily select and manage your preferred AI digital assistants on Android.
Agent-MCP is a framework for creating multi-agent systems that enables coordinated, efficient AI…
A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization…
Build and Deploy a Full Stack MERN AI Image Generation App MidJourney & DALL E Clone
Build your own Cowork, AI Scientist and other SoTA Agents just by editing config files. Support anthropic…
A Python-based lightweight robot simulator designed for navigation, control, and learning
An AI-powered interactive avatar engine using Live2D, LLM, ASR, TTS, and RVC. Ideal for VTubing, streaming…
超级AI大脑一个基于SpringCloud微服务架构,已对接GPT-3.5、GPT-4.0、百度文心一言、stable diffusion…
WebRover is an autonomous AI agent designed to interpret user input and execute actions by interacting with…
open-source framework for creating and managing simulations populated with AI-powered agents. It provides an…
ChatGPT CLI is a powerful, multi-provider command-line interface for working with modern LLMs. It supports…
🤖 Components Library for Quickly Building LLM Chat Interfaces.
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design
A holistic framework to enable the design, development, and evaluation of autonomous AIOps agents.
AI-powered tools to enhance Anki flashcards with explanations, mnemonics, illustrations, and adaptive…
AI-First Album: Chat with your gallery using plain language! LLM Vision + RAG + Album/Gallery.
🎨 Image collector, support for custom acquisition source, compatible with Windows and MacOS!|…
[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that…
Train Models Contrastively in Pytorch
🤖 Beautifully designed chatbot components based on shadcn/ui
A Cursor skill that gives AI agents real UI component knowledge — best practices, layout patterns, and…
Azure AI Foundry (demos, documentation, accelerators).
为 AI Agent 设计的 JS 逆向 MCP Server,内置反检测,基于 chrome-devtools-mcp 重构 | JS reverse engineering MCP server with…
Multi-agent framework for design, simulation, and auditing.
JTokkit is a Java tokenizer library designed for use with OpenAI models.
Awesome AI Memory | LLM Memory | A curated knowledge base on AI memory for LLMs and agents, covering…
AI Agnostic (Multi-user and Multi-bot) Chat with Fictional Characters. Designed with scale in mind.
ComfyUI-IF_AI_tools is a set of custom nodes for ComfyUI that allows you to generate prompts using a local…
End-to-end RAG system design, evaluation, and optimization. 极客时间RAG训练营,RAG 10大组件全面拆解,4个实操项目吃透 RAG…
基于Stable Diffusion优化的AI绘画模型。支持输入中英文文本,可生成多种现代艺术风格的高质量图像。| An optimized text-to-image model based on Stable…
🦀An agentic AI assistant that lives in your chats, inspired by nanoclaw and incorporating some of its design…
InnoShop is an AI-powered open source e-commerce system built on Laravel 12, designed for global commerce. It…
A wechat robot based on ChatGPT with no risk, very stable! 🚀
Chain together LLMs for reasoning & orchestrate multiple large models for accomplishing complex tasks
🌌 Give a soul to your digital waifu. Soul of Waifu is an immersive desktop roleplay & AI companion engine…
Official implementation for "Blended Diffusion for Text-driven Editing of Natural Images" [CVPR 2022]
ChatDev IDE is an tools for building your ai agent, Whether it's NPCs in games or powerful agent tools, you…
MCP-Universe is a comprehensive framework designed for RL training, benchmarking, and developing AI agents…
This repository hosts a suite of specialized agents designed to power your brainstorming sessions. Each agent…
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models…
Create a private chatgpt website via vercel
Image to text, fast.
Vibe Design meets Figma. Let AI agents design directly in Figma.
VMAS is a vectorized differentiable simulator designed for efficient Multi-Agent Reinforcement Learning…
Control Figma from the command line. Full read/write access for AI agents — create shapes, text, components…
🔍 Search local images with natural language on Android, powered by OpenAI's CLIP model. / 在 Android…
ChatGPT-Pro is an advanced application that combines the power of ChatGPT and DALL.E.
Contrastive Language-Image Forensic Search allows free text searching through videos using OpenAI's machine…
Local AI anywhere, for everyone — LLM inference, chat UI, voice, agents, workflows, RAG, and image…
[Deprecated & ingrated in docker-agent] Docker image for a Jenkins agent which can connect to Jenkins using…
Building LLM-Enabled Multi Agent Applications from Scratch
Universal AI Agent using Amazon Bedrock, capable of customize to create/edit files, execute commands, search…
🤖️ 基于 Golang + Vue3 + NaiveUI 的全新的个人、团队、企业私有化AIGC平台
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Open Source Project Management with Conversational AI Task Execution. Built for teams who want conversational…
Second Brain is a desktop application that acts as a personal knowledge base, using retrieval-augmented…
Design system skills for agentic tools
日本語UIをAIエージェントに正しくつくらせるためのDESIGN.md集。Japanese DESIGN.md collection for AI agents — extending Google Stitch…
Clipboard Conqueror is a novel copy and paste copilot alternative designed to bring your very own LLM AI…
❤开箱即用❤an unofficial implement of ChatGPT in QQ/Wechat. 一个非官方的ChatGPT腾讯qq/微信(非公众号)实现版,快来把你的qq或微信变成chatgpt吧
AI Assistant that reduces the size of your application's Docker Image
Official repo of VLABench, a large scale benchmark designed for fairly evaluating VLA, Embodied Agent, and…
Bring back Clippy on Windows 10/11!
Universal CPU profiler designed for humans and AI agents
A versatile tool designed to help prototype intelligent assistants, agents and multi-agentic systems
🧠 世界上覆盖最全的优秀Qwen提示语大全,欢迎贡献你的提示词。🧠 The most comprehensive collection of excellent Qwen prompts in the world…
ACP is the Agent Control Plane - a distributed agent scheduler optimized for simplicity, clarity, and…
A curated archive of breakthroughs in Agents, Architecture, Training, RAG, and On-Device AI.
Alfred workflow using ChatGPT, DALL·E 2 and other models for chatting, image generation and more.
[CVPR' 2026] JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator…
AgentScope Spark Design - UI Component Library for Alibaba Cloud Apsara Lab
Generate a picture book from a single prompt using OpenAI function calling, replicate, and Deep Lake
Open-source agent skills for generating editorial-style information cards from natural-language input.
On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable…
Open-WebUI-Functions is a collection of custom pipelines, filters, and integrations designed to enhance Open…
Jenkins agent (base image) and inbound agent Docker images
A lightweight, modular Java application framework for web and CLI development, designed for AI…
This repository is a hub for AI Agent projects, including GitHub Sentinel, LanguageMentor, and ChatPPT…
The world's most powerful open-source bio AI assistant - Access academic literature, clinical trials, drug…
[IJCV 2024] InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions
A C++/Python implementation of the StreetLearn environment based on images from Street View, as well as a…
Visual Builder for AI Workflows and Agents
Mirror of Stable-Baselines: a fork of OpenAI Baselines, implementations of reinforcement learning algorithms
The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of…
An intelligent Wiki overlay tool designed specifically for gamers, featuring GEMINI-powered RAG…
Visual and textual documentation of 21 essential agentic design patterns for building intelligent AI systems
An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring…
Design notes, architecture, and trade-offs for MyInvestPilot — an AI-native investment OS built with…
Practical system design, tools, and hands-on resources for building Gen-AI agents & agentic AI systems.
Local AI Assistant on your phone
[ICML 2025 Oral] Official repo of EmbodiedBench, a comprehensive benchmark designed to evaluate MLLMs as…
Self-healing infrastructure for AI agent payments. 90.3% auto-recovery.
META‑AGENTIC α‑AGI 👁️✨ — Mission 🎯 End‑to‑end: Identify 🔍 → Out‑Learn 📚 → Out‑Think 🧠 → Out‑Design 🎨 →…
CLIP⚡NCNN⚡基于自然语言的图片搜索(Image Search)⚡以字搜图⚡x86⚡Android
[ICLR-2025] POGEMA stands for Partially-Observable Grid Environment for Multiple Agents. This is a grid-based…
AI system design guide for engineers building production AI systems and evals.
Human-taught Computer-use Agent Designed for Real Windows and MacOS Desktops.
A highly efficient, secure, and stable application development platform with excellent performance, easy…
Gen-Searcher: Reinforcing Agentic Search for Image Generation
[NeurIPS 2024] ReEvo: Large Language Models as Hyper-Heuristics with Reflective Evolution
If you are an AI agent and want to discover services designed for you.
Master AI BOT 🤖: Unleash the power of GPT-4 Turbo with our fast and limitless Telegram bot. Say goodbye to…
SharpToken is a C# library for tokenizing natural language text. It's based on the tiktoken Python library…
RivonClaw is an easy-mode runtime and UI layer built on top of OpenClaw, designed to turn long-lived AI…
Your fully proficient, AI-powered and local chatbot assistant🤖
AIFlow is an AI agentic framework designed to scale digital AI agents on BNB Chain.
(NeurIPS 2024) AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning
a comprehensive and critical synthesis of the emerging role of GenAI across the full autonomous driving stack
A template for building WhatsApp agents using LangGraph and Twilio. This project enables you to deploy AI…
Generate images by Stable-Diffusion-webui Based on Python | 使用Python的基于 SD-webui 的画图机器人(支持中文、Novelai和Naifu)
🤖 A Matrix bot for using different capabilities (text-generation, text-to-speech, speech-to-text…
CLI client for podwise.ai — turn any podcast episode into AI-powered insights, designed for use in AI agents…
Unofficial Linux packages for Claude Desktop AI assistant with automated updates.
Belullama is a comprehensive AI application that bundles Ollama, Open WebUI, and Automatic1111 (Stable…
Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing…
Native Swift SDK for building autonomous AI agents with Apple's FoundationModels design philosophy
ALMA (Automated meta-Learning of Memory designs for Agentic systems) is a framework that meta-learns memory…
AutoClaw is a hyper-lightweight AI agent designed to live inside Docker containers. Unlike heavy…
Guide for designing adaptive, scalable, and secure enterprise multi-agent systems
OpenCluely is a free, open source Cluely (alternative), built for technical interviews like DSA, OAs, and CP…
Easyreadme helps you simplify README creation and generate visually stunning ones with the help of AI and…
Fully automated token deployment on ETH, using ChatGPT and DALL-E.
高性能数字人桌面应用框架,开箱即用,集成了AI对话与动态壁纸,即使在较低性能的设备上也能流畅运行数字人
AI-Powered Game Development Team in Your Terminal
A simulated operating system design for AI Agents to interact with the world
ToolMate AI, developed by Eliran Wong, is a cutting-edge AI companion that seamlessly integrates agents…
NuGet package designed to make LLMs, RAG, and Agents first-class citizens in .NET
Beautifully designed components for building AI Agents 🌎
C++ Agent toolkit - Pre-built binaries, visit: https://github.com/mtconnect/cppagent/releases Docker images…
EDUMCP is a protocol that integrates the Model Context Protocol (MCP) with applications in the education…
c4 GenAI Suite
Algolia + Angular = 🔥🔥🔥
Browser script to share and export Anthropic Claude chat logs to Markdown, JSON, or as Image (PNG)
AI-driven web automation agent that uses Playwright for browser interactions and LLM integration for…
AgentAI is a Rust library designed to simplify the creation of AI agents
Connect to hosted payram helper at: https://mcp.payram.com
Polykalshi AI Agent (Rust Edition): Unleash the power of AI with Polykalshi, a blazing-fast, highly efficient…
Model Context Protocol (MCP) server designed to allow AI agents within Cursor to interact with Jupyter…