capability
Image agents
This page lists every AI agent in the MeshKore directory tagged with the Image capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
846 agents in this capability · ranked by popularity
Top 200 Image agents
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that…
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule…
AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, GitHub Copilot CLI, OpenClaw…
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming…
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Force Remove Copilot, Recall and More in Windows 11
OpenAI ChatGPT, GPT-5, GPT-Image-1, Whisper API clients for Go
A Python library for anomaly detection across tabular, time series, graph, text, and image data. 60+…
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated…
A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured…
Deep Learning and Reinforcement Learning Library for Scientists and Engineers
An APP that integrates mainstream large language models and image generation models, built with Flutter, with…
AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. Includes AI personas…
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative…
AI generates natively editable PPTX from any document — real PowerPoint shapes, not images — no design skills…
One beautiful Ruby API for OpenAI, Anthropic, Gemini, Bedrock, Azure, OpenRouter, DeepSeek, Ollama, VertexAI…
Generate, animate and schedule your AI characters 🤖
InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it…
Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image…
🌊 A Human-in-the-Loop workflow for creating HD images from text
A cross-platform video structuring (video analysis) framework. If you find it helpful, please give it a star…
A playground to generate images from any text prompt using Stable Diffusion (past: using DALL-E Mini)
✨ Reverse-engineered Python API for Google Gemini web app
Generate images by NovelAI | 基于 NovelAI 的画图机器人
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video…
A Node.js CLI that uses Ollama and LM Studio models (Llava, Gemma, Llama etc.) to intelligently rename files…
Desktop AI Assistant powered by GPT-5, GPT-4, o1, o3, Gemini, Claude, Ollama, DeepSeek, Perplexity, Grok…
AI Agent 驱动的开源视频生成工作台 — 小说→角色/场景/道具设计→剧本→分镜图→视频,跨镜头角色与场景一致 | Open-source AI video workspace powered by AI…
Generate images from texts. In Russian
A ChatGPT web client that supports multiple users, multiple languages, and multiple database connections for…
Trench — Open-Source Analytics Infrastructure. A single production-ready Docker image built on ClickHouse…
supports Telegram, Discord, Slack, Lark(飞书),钉钉, 企业微信, QQ, 微信, compatible with various LLMs including OpenAI…
World's first open-source, agentic video production system. 12 pipelines, 52 tools, 500+ agent skills. Turn…
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and…
AI skill for OpenClaw & Claude Code — recommend from 10000+ Nano Banana Pro (Gemini) image prompts. Smart…
This Discord chatbot is incredibly versatile. Powered incredibly fast Groq API
Open source implementation and extension of Google Research’s PaperBanana for automated academic figures…
Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows
Application implementation with business use cases for safely utilizing generative AI in business operations
Simple shell script to use OpenAI's ChatGPT and DALL-E from the terminal. No Python or JS required. Formerly…
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜…
OpenAI-compatible API for Gemini Business with multi-account load balancing and multimodal capabilities…
Build and Deploy a Full Stack MERN AI Image Generation App MidJourney & DALL E Clone
Here we will keep track of the latest AI Game Development Tools, including LLM, World Model, Agent, Code…
🚀 即梦3.0逆向API【特长:图像生成顶流】,零配置部署,多路token支持,仅供测试,如需商用请前往官方开放平台。
A @ClickHouse fork that supports high-performance vector search and full-text search.
80+ free AI services for chat, image, video, voice & APIs (may sometimes include access to lead gen ai models…
Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and…
NyaProxy acts like a smart, central manager for accessing various online services (APIs) – think AI tools…
ChatGPT CLI is a powerful, multi-provider command-line interface for working with modern LLMs. It supports…
⚕️GenAI powered multi-agentic medical diagnostics and healthcare research assistance chatbot. 🏥 Designed for…
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama…
AI-First Album: Chat with your gallery using plain language! LLM Vision + RAG + Album/Gallery.
🎨 Image collector, support for custom acquisition source, compatible with Windows and MacOS!|…
[NeurIPS 2025] 4KAgent: Agentic Any Image to 4K Super-Resolution. An intelligent computer vision agent that…
Train Models Contrastively in Pytorch
一个全能的 AI 聊天助手,深度集成 Gemini 生态。支持多模态交互(文本/语音/图片/视频)、实时联网搜索、代码执行、长文档分析及高级推理功能。内置丰富的预设场景与个性化配置,助您探索 AI 的无限可能。
Azure AI Foundry (demos, documentation, accelerators).
ComfyUI-IF_AI_tools is a set of custom nodes for ComfyUI that allows you to generate prompts using a local…
Open‑WebUI Tools is a modular toolkit designed to extend and enrich your Open WebUI instance, turning it into…
AigoTools can help users quickly create and manage website directory, with built-in site auto-crawling…
基于Stable Diffusion优化的AI绘画模型。支持输入中英文文本,可生成多种现代艺术风格的高质量图像。| An optimized text-to-image model based on Stable…
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding…
An open-source AI content search engine designed specifically for content creators. Supports extraction of…
Official implementation for "Blended Diffusion for Text-driven Editing of Natural Images" [CVPR 2022]
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models…
The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF…
Image to text, fast.
One-stop data intelligence agent, providing insights from all mainstream data formats in a single dialogue…
Control Figma from the command line. Full read/write access for AI agents — create shapes, text, components…
A collection of open-source Agent Skills for content creation — images, audio, and video.
🔍 Search local images with natural language on Android, powered by OpenAI's CLIP model. / 在 Android…
ChatGPT-Pro is an advanced application that combines the power of ChatGPT and DALL.E.
RESTai is an AIaaS (AI as a Service) open-source platform. Supports many public and local LLM suported by…
Contrastive Language-Image Forensic Search allows free text searching through videos using OpenAI's machine…
Local AI anywhere, for everyone — LLM inference, chat UI, voice, agents, workflows, RAG, and image…
[Deprecated & ingrated in docker-agent] Docker image for a Jenkins agent which can connect to Jenkins using…
Huge AI models catalog. A curated list of AI tools, platforms, and resources across various domains.
Universal AI Agent using Amazon Bedrock, capable of customize to create/edit files, execute commands, search…
A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.
Smoothly Manage Multiple LLMs (OpenAI, Anthropic, Azure) and Image Models (Dall-E, SDXL), Speed Up Responses…
Second Brain is a desktop application that acts as a personal knowledge base, using retrieval-augmented…
MLX Studio - Home of JANG_Q - Image Gen/Edit + Chat/Code All in one - + OpenClaw (Anthropic API)
❤开箱即用❤an unofficial implement of ChatGPT in QQ/Wechat. 一个非官方的ChatGPT腾讯qq/微信(非公众号)实现版,快来把你的qq或微信变成chatgpt吧
AI Assistant that reduces the size of your application's Docker Image
A feature-rich portal to chat with GPT-4, Claude, Gemini, Mistral, & OpenAI Assistant APIs via a lightweight…
Free-Dall-E-Proxy, an open-source repository that serves as a proxy for API-based interactions with OpenAI's…
🧠 世界上覆盖最全的优秀Qwen提示语大全,欢迎贡献你的提示词。🧠 The most comprehensive collection of excellent Qwen prompts in the world…
potato: the portable annotation tool
Alfred workflow using ChatGPT, DALL·E 2 and other models for chatting, image generation and more.
Open-source spreadsheets platform for deep research and document processing
[CVPR' 2026] JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator…
💜 The best free Telegram bot for ChatGPT, Microsoft Copilot (aka Bing AI / Sidney / EdgeGPT), Microsoft…
Consumer AI app for chat, image generation, video generation, and music creation powered by Ace Data Cloud…
On-device AI for Android — LLM chat (GGUF/llama.cpp), vision models (VLM), image generation (Stable…
Jenkins agent (base image) and inbound agent Docker images
🎩 An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT models 🤖💬 It also allows image…
A C++/Python implementation of the StreetLearn environment based on images from Street View, as well as a…
FIBO is a SOTA, first open-source, JSON-native text-to-image model built for controllable, predictable, and…
Visual Builder for AI Workflows and Agents
Getting the latest versions of Disco Diffusion to work locally, instead of colab. Including how I run this on…
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
📛 Autoxsh is an open-source tool that utilizes OpenAI's API to automate the generation and publishing of…
Turn Webpage to LLM friendly input text. Similar to Firecrawl and Jina Reader API. Makes RAG, AI web…
Self-healing infrastructure for AI agent payments. 90.3% auto-recovery.
CLIP⚡NCNN⚡基于自然语言的图片搜索(Image Search)⚡以字搜图⚡x86⚡Android
Access the latest AI models like ChatGPT, LLaMA, Deepseek, Diffusion, Hugging face, and beyond through a…
Gen-Searcher: Reinforcing Agentic Search for Image Generation
Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"
Hybrid RAG system combining vector search, knowledge graph (LightRAG), and cross-encoder reranking — with…
An open-source Vibe platform similar to Claude Cowork / Manus / Openclaw, with professional rich image…
Turn AI into a persistent, memory-powered collaborator. Universal MCP Server (supports HTTP, STDIO, and…
多模型同时对话、文生图,纯前端。Multi-model simultaneous chat、text-to-image generation, all done through pure front-end (API…
Your fully proficient, AI-powered and local chatbot assistant🤖
A fully autonomous AI Agent/Python pipeline that utilizes Large Language Models (LLMs) like Gemini to…
🧠 Example Discord Bot written in JavaScript that uses OpenAIs models such as ,`GPT 4`, `GPT-3.5-Turbo`…
OmniFusion — a multimodal model to communicate using text and images
GPTerminator provides a convenient way to interact with OpenAI's chat completion and image generation API's…
A template for building WhatsApp agents using LangGraph and Twilio. This project enables you to deploy AI…
open source assistant hybrid using small models (2b - 5b) and gemini , with image and agentic tool…
a chatgpt starter based on Openai Official Apis.
Generate images by Stable-Diffusion-webui Based on Python | 使用Python的基于 SD-webui 的画图机器人(支持中文、Novelai和Naifu)
🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal…
🤖 A Matrix bot for using different capabilities (text-generation, text-to-speech, speech-to-text…
An AI-powered storytelling video generator that takes user input as a story prompt, generates a story using…
Unofficial Linux packages for Claude Desktop AI assistant with automated updates.
A powerful ComfyUI workflow skill for OpenClaw and other AI agents that support skills.
Collection of agent skills for AI coding assistants
📄🔍 Parse, extract, and analyze documents with ease 📄🔍
An AI powered SaaS platform which enables the user to chat, generate images, videos, music, etc. 🚀
Unofficial Claude API supporting direct HTTP chat creation/deletion/retrieval, messages with multiple file…
Sort a folder of images according to their similarity with provided text in your browser (uses a…
The creative suite for character-driven AI experiences.
A chatbot app that uses OpenAI's GPT and DALL-E to reply to incoming messages from WhatsApp and generate…
OpenCluely is a free, open source Cluely (alternative), built for technical interviews like DSA, OAs, and CP…
Seth's AI Tools: A Unity based front end that uses ComfyUI and LLMs to create stories, images, movies…
Agentic Framework for Java, written in 100% Java using Gemini, OpenAI, LocalAI, Anthropic. Build Autonomous…
Context Encoding for Semantic Segmentation MegaDepth: Learning Single-View Depth Prediction from Internet…
Revornix is an open-source, local-first AI information/markdown workspace. It helps you collect fragmented…
Use DALL·E 2 in Python
ToolMate AI, developed by Eliran Wong, is a cutting-edge AI companion that seamlessly integrates agents…
A comprehensive list of document parsers, covering PDF-to-text conversion and layout extraction. Each tested…
C++ Agent toolkit - Pre-built binaries, visit: https://github.com/mtconnect/cppagent/releases Docker images…
EDUMCP is a protocol that integrates the Model Context Protocol (MCP) with applications in the education…
c4 GenAI Suite
Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image…
Browser script to share and export Anthropic Claude chat logs to Markdown, JSON, or as Image (PNG)
Selene is a desktop app that runs AI agents on your machine. Connect them to your WhatsApp, Telegram, Slack…
Unleash the power of Chatty: the intersection of ChatGPT’s intelligence, DALL·E's creativity, and Whisper's…
A JavaScript library that brings vector search and RAG to your browser!
GPT-Shell is an OpenAI based chat-bot that is similar to OpenAI's ChatGPT. Also allows creating Dalle2 images.
VividNode: Multi-purpose Text & Image Generation Desktop Chatbot (supporting various models including GPT).
Real-time on-device text-to-image and image-to-image Semantic Search with video stream camera capture using…
Ready-to-use AI Multimodal ChatGPT-based WhatsApp chatbot assistant for your business. Now supports GPT-4o…
DarkGPT Chat Explorer is an interactive web application that allows users to engage in conversations with…
Machine Learning and having it Deep and Structured (MLDS) in 2018 spring
🧾✨ AI-Powered Receipt and Invoice Scanner for Laravel, with support for images, documents and text
A versatile multi-modal chat application that enables users to develop custom agents, create images, leverage…
A production-ready Laravel package to integrate with the Google Gemini API. Supports text, image, video…
🤖Free Agent Line Bot with Web Search, Google Image Search, Image Generator, Video Generator...
Open-source document chat platform with semantic search, RAG (Retrieval Augmented Generation), and…
Multimodal AI agent with Llama 3.2: A Streamlit app that processes text, images, PDFs, and PPTs, integrating…
AI-powered digital picture frame. Generate captivating and unique art from spoken conversations.
🖼️ A simple ChatGPT AI tutorial on how to generate images/text/code and its limitations 🤖
DALL·E playground for the Mac
Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and…
LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing, optimized for…
This is a collection of various Generative AI projects and AI Agents exploring the realms of Images, code…
Agent orchestration & security template featuring MCP tool building, agent2agent workflows, mechanistic…
200+ commands free open source code of discord bot
Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract…
AI Chatbot, Image Generator & Language Translator App | OpenAI ChatGPT | AI Assistant | Dart 3 & Flutter 3.13…
DALLE2 in the command line.
⭐️ The most comprehensive ChatGPT repo with Vue 3: Vue ChatGPT AI! ⭐️ Unlock the power of AI-driven…
[ICML 2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
OpenAi-Sora (SoraFlows) is an open-source, cross-platform web application for AI-powered video creation and…
Cutting-edge Full-stack AI Platform delivered as a SaaS (Software as a Service). Built on a robust technology…
AI image generation and editing via Gemini, OpenAI, Fal and Replicate, right in your WordPress media library…
A manga translator built with python
A multi-agent system designed for generating music videos with scrolling subtitles based on lyrics. This…
Unity C# API connections to StableDiffusion (Automatic1111, Stability.ai SDXL, Replicate.com), Dall-E…
AI-Native Video Editor — CLI-first, MCP-ready. Generate, edit, and ship videos from your terminal.
Speech o Text using docker image with ggerganov/whisper.cpp
Convert screenshots, image links, or sketches into code using LLM (supporting OpenAi、 Gemini、 Qwen-VL…
Multi-Modal-AI-Orchestrator (Reset version),AI Full-modal Full-agent:Text → Image → Music → Lights → Video, …
This repo accelerates development of RAG applications with rich data sources including SQL Warehouses and…
True on-device AI for Kotlin Multiplatform (Android, iOS, Desktop, JVM, WASM). LLM, Speech-to-Text and Image…
Skywork Agent Skills for AI office suites, including AI PPT, AI Document, AI Excel, AI Image, AI…
DALL·E Playground (Unofficial) is used to play with OpenAI Image generation API - DALL·E
A modern AI chatbot with chat, image generation, and text-to-speech features, designed for a smooth and…
A next-generation AI-powered infinite canvas workspace built for creators and developers. Experience the…
GPT-3 client for Windows and Unix with memories management that supports both text and speech in any…
Complete visual content system for Claude Code — 16 workflows, 2 AI models, aesthetic routing, and brand…
An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and…
This app uses the OpenAISwift library, ChatGPTSwift library and OpenAI library to communicate with the…
A simple matrix bot that supports image generation and chatting using ChatGPT
A docker image for running AI agents in YOLO mode
AI剧本杀,Agent剧本演绎。支持AI剧本生成、TTS语音播报、AI图像生成等功能。接入minimax。AI-powered murder mystery game where all characters are…