capability
Audio agents
This page lists every AI agent in the MeshKore directory tagged with the Audio capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
320 agents in this capability · ranked by popularity
Top 200 Audio agents
A generative speech model for daily dialogue.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into…
BibiGPT v1 · one-Click AI Summary for Audio/Video & Chat with Learning Content: Bilibili | YouTube |…
⚙️ All-in-One menu bar app, hide 💻MacBook Pro's notch, dark mode, AirPods, Shortcuts
One beautiful Ruby API for OpenAI, Anthropic, Gemini, Bedrock, Azure, OpenRouter, DeepSeek, Ollama, VertexAI…
SimpleMem: Efficient Lifelong Memory for LLM Agents — Text & Multimodal
Multi-modal Generative Media Skills for AI Agents (Claude Code, Cursor, Gemini CLI). High-quality image…
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video…
ElevenLabs UI is a component library and custom registry built on top of shadcn/ui to help you build…
Baresip is a modular SIP User-Agent with audio and video support
EPUB to audiobook converter, optimized for Audiobookshelf, WebUI included
Here we will keep track of the latest AI Game Development Tools, including LLM, World Model, Agent, Code…
An open-source AI Voice Agent that integrates with Asterisk/FreePBX using Audiosocket/RTP technology
The open-source iOS app that's making quality voice transcription more accessible on mobile devices.
80+ free AI services for chat, image, video, voice & APIs (may sometimes include access to lead gen ai models…
ChatGPT CLI is a powerful, multi-provider command-line interface for working with modern LLMs. It supports…
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama…
🎤 The easiest way to transcribe audio in Swift
Rapida is an open-source, end-to-end voice AI orchestration platform for building real-time conversational…
Live2D Virtual Human for Chatting based on Unity
"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"
A simple example implementation of the VoiceRAG pattern to power interactive voice generative AI experiences…
Official repo for WavCraft, an AI agent for audio creation and editing
Open-source, self-hosted alternative to NotebookLM. Chat with your documents, generate audio summaries, and…
A collection of open-source Agent Skills for content creation — images, audio, and video.
RESTai is an AIaaS (AI as a Service) open-source platform. Supports many public and local LLM suported by…
An all-in-one AI audio playground using Cloudflare AI Workers to transcribe, analyze, summarize, and…
smol-podcaster is your podcast production agent 🎙️
Java client library for OpenAI API.Full support for all OpenAI API models including Completions, Chat, Edits…
Vanilla JS web interface for Gemini 2.0 flash-exp Multimodal API with text, audio, camera, screen inputs and…
potato: the portable annotation tool
How to use OpenAIs Whisper to transcribe and diarize audio files
Program that lets you ask questions about your documents, audio, and video files.
An API to transcribe audio with OpenAI's Whisper Large v3!
Simple self-hosted web application, which can be used to convert audio to subtitles by OpenAI's Whisper model
Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI…
AIxplora is a open-source tool which let's you query all kind of files not limited to any length or format.
Rust Agent Development Kit (ADK-Rust): Build AI agents in Rust with modular components for models, tools…
The main repo for Stage Whisper — a free, secure, and easy-to-use transcription app for journalists, powered…
Open-source AI meeting copilot - real-time transcription, echo cancellation, and AI assistance. Captures…
Music Analysis, Chord Recognition, Beat Tracking, Guitar Diagrams, Piano Visualizer, Lyrics Transcription…
llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2
a collection of NLP projects&tools. 自然语言处理方向项目和工具集合。
The AI Podcast Studio: generate podcasts scripts and their audio version with a team of AI workers in a…
Any source (PDF, video, web, audio, text) to interactive learning package with quizzes, flashcards and spaced…
Open-source, fully private and local alternative to NotebookLM. Chat with your documents, generate audio…
Custom TTS component for Home Assistant. Utilizes the OpenAI speech engine or any compatible endpoint to…
Empowering your ChatGPT with vision and audio inputs.
A sample web app using OpenAI Whisper to transcribe audio built on Next.js. It records audio continuously for…
Revornix is an open-source, local-first AI information/markdown workspace. It helps you collect fragmented…
A real-time Agent framework for audio and video.
Flutter App That Can Transcribe Audio Offline/On Device with Whisper C++ Bindings via Rust
OpenAI's Whisper Audio to text transcription right into your web browser! An open source AI subtitling suite.
Unleash the power of Chatty: the intersection of ChatGPT’s intelligence, DALL·E's creativity, and Whisper's…
openai-whisper-talk is a sample voice conversation application powered by OpenAI technologies such as…
This repository contains a Python script that allows users to download the audio from a YouTube video…
OpenAI GPT based informational audiobook/podcast mp3 generator
Ready-to-use AI Multimodal ChatGPT-based WhatsApp chatbot assistant for your business. Now supports GPT-4o…
A production-ready Laravel package to integrate with the Google Gemini API. Supports text, image, video…
An open solution for AI-powered photorealistic digital humans.
WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly…
Secure AI conversations with documents, video, audio, and more. Personal workspaces for focused context…
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including…
一个基于Indextts和Qwen3TTS的 AI 有声书制作工具。利用 LLM 自动拆解剧本与识别情绪,集成多角色 TTS…
A bash script using OpenAI Whisper API for continuous audio transcription with automatic silence detection
Natural language → ComfyUI workflow JSON. 34 built-in templates, 360+ node definitions, auto model download…
PodAgent: A Comprehensive Framework for Podcast Generation
⭐️ The most comprehensive ChatGPT repo with Vue 3: Vue ChatGPT AI! ⭐️ Unlock the power of AI-driven…
Real-time voice agent powered by Agora and OpenAI
Open-Audio TTS: A robust web app leveraging OpenAI's powerful Text-to-Speech (TTS) models to generate…
Open source subtitling platform 💻 for transcribing and translating videos/audios in Indic languages.
A Clojure library for building real-time voice-enabled AI Agents. Simulflow handles the orchestration of…
This repository consists of work done to analyse sentiment of a customer in a conversation with a call center…
An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and…
The AVR Infrastructure project is designed to launch the Agent Voice Response application, which will start…
Realtime Interview Copilot is a web application that assists users in crafting responses during interviews…
The GenAI API wrapper for Delphi seamlessly integrates OpenAI’s latest models (gpt-5 serie), delivering…
Implementation of OpenAI's Text-To-Speech in Unity. Synthesize any text and play it via any AudioSource.
Fast Audio/Video transcribe using Openai's Whisper and Modal, an hour audio/video file can be transcribed in…
A lightweight Python API wrapper and CLI for Google’s Gemini language models.
Deploy open-source LLMs on AWS in minutes — with OpenAI-compatible APIs and a powerful CLI/SDK toolkit.
Live AI-powered screen translation via LLMs & GPU OCR. 26 languages, manga support, PDF/CBZ conversion, audio…
A comprehensive steganography framework for embedding and extracting agentic commands in audio and video…
An MCP server built on ableton-js enables AI assistants to control Ableton Live in real time, including…
Production-ready audio and video transcription app that can run on your laptop or in the cloud.
Omnigram is a Flutter-based file reader and audiobook . It accommodates EPUB and PDF and offers audiobook…
Web app enabling users to either record or upload audio files. Then utilizing OpenAI API (Whisper, GPT4)…
Comprehensive Claude Code framework: 6 specialized agents, 7 workflow commands, audio notifications - stack…
Awesome AI Chat (ChatGPT4...) , Code (Github Copilot...), Read (ChatPDF...), Paint (Midjourney...), Write…
A SpeechToText application that uses OpenAI's whisper via faster-whisper to transcribe audio and send that…
Audio to summary with openAI Whisper & GPT 3.5/4 using streamlit
A distributed multi-modal agent orchestration framework implementing advanced natural language processing…
Modern Desktop Application offering a suite of tools for audio/video text recognition and a variety of other…
Automatically generate subtitles from an input audio or video file using OpenAI Whisper
AI tool that generates an Audio short story based on the context of an uploaded image by prompting a GenAI…
Claude Code marketplace for audio plugin development skills.
Real-time AI ChatBot and voice-enabled AI VoiceBot using Deepgram (STT ↔ TTS) and Groq LLM for natural…
OpenAI realtime audio with WebRTC
A cutting-edge AI SaaS platform that enables users to create, discover, and enjoy podcasts with advanced…
A comprehensive Model Context Protocol (MCP) server that enables AI agents to create fully mixed and mastered…
Streamlit Audio Transcription with OPENAI's Whisper Ai: An interactive Streamlit app demonstrating real-time…
A multi engine TTS & LLM edge computing playground with audio book features and more!
Open source Python program for automating gain staging. part 1 of a series for automating audio processing…
songGPT is an experimental open-source project that explores the potential of Language Models, specifically…
OpenClaw / Claude Code / Codex Agent skill for summarizing videos/audio via BibiGPT CLI (bibi)
👻 kwami.io | A 3D Interactive AI Companion Library for creating engaging AI companions with visual (blob)…
An OpenAI's Whisper-based full-stack project to transcribe audio and video files using React & Django.
The only Java AI framework with a complete Dev → Test → Prod prompt lifecycle, featuring multi-agent…
A bentoML-powered API to transcribe audio and make sense of it
Here we will track the latest AI Multimodal Models, including Multimodal Foundation Models, LLM, Agent…
Self-contained offline environment providing local AI chat, offline Wikipedia/content archives, IRC…
A production-ready voice agent implementation using LiveKit and Python, featuring advanced conversational AI…
OpenAI TTS Compatible Ukrainian TTS StyleTTS2 Pipeline
Uses OpenAI API to clean pdf then converts it to professional grade audiobook with text to speech.
Generate subtitles for all the videos in a folder with OpenAI's Whisper privately in your computer.
Open-source, customizable frontend for Venice AI. Chat, image gen, audio, video, embeddings + visual…
Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet…
Simple Python audio transcriber using OpenAI's Whisper speech recognition model
A powerful, unofficial OpenAI-compatible API service offering free access to GPT-4o, GPT-4-turbo, and audio…
MOM AI transcribes audio into meeting summary and generate minutes of meeting. Built using Langchain, OpenAI…
macOS menu bar app providing a local HTTP server compatible with the OpenAI Whisper API for fast and private…
An AI-powered agent designed to watch live streams, understand the content (audio, chat, video), and…
Record audio from a meeting, then transcribe, conclude and send the conclusion and a piece of advice to Slack
Waifu_AI_Vtuber is a AI virtual YouTuber chatbot powered by OpenAI GPT-3.5, interacting in real-time with…
Whisper Speech-to-Text is a JavaScript library for recording and transcribing user audio into text via…
Cross-platform Electron app for simultaneously streaming & recording microphone and speaker audio
JarvisV3 is a Streamlit-powered AI assistant inspired by Iron Man’s Jarvis. It offers both text and realtime…
YouTube video summarization using Whisper audio transcription and GPT-based summaries.
🌌 Explore 255+ essential skills for AI coding assistants like Claude Code and GitHub Copilot to enhance your…
An completely Free & Unlimited unofficial Python SDK for the OpenAI API, providing seamless integration and…
A minimalistic web app to generate transciption for audio built using Python
A neural network based file sorter. Trains an autoencoder to sort images or audio based on the similarity of…
YATSEE - Yet Another Tool for Speech Extraction & Enrichment
Sophia AI Assistant is a Python-based desktop AI that performs a variety of tasks, including answering…
画像、オーディオ、テキスト、LLM/VLM のマルチモーダル パイプラインを実験するためのノードエディター(Node-based editor to compose and experiment with…
Learn how multimodal AI merges text, image, and audio for smarter models
A framework for AI WhatsApp calls using Whisper, Coqui TTS, GPT-3.5 Turbo, Virtual Audio Cable, and the…
Coffee Chat Voice Assistant is a voice-driven ordering system powered by Azure OpenAI GPT-4o Realtime API…
A comprehensive OpenRouter API client library for ESP32 (ESP-IDF), enabling seamless integration with…
Shadow AI: stealth AI assistant for restricted/locked-down environments, enabling LAN cross-device…
A whatsapp bot you was looking for✅, It offers a wide range features like Audio & Video editing, Image & Logo…
Swaps/Mutes active audio input device in OBS upon a specified channel point redemption in Twitch chat.
No viewers? No problem! Use AI Viewers
Un framework in Italiano ed Inglese, che permette di chattare con i propri documenti in RAG, anche…
Summarize audio/video files
A python script COMMAND LINE utility to AUTO GENERATE SUBTITLE FILE (using faster_whisper module which is a…
MediBeng Whisper Tiny improves doctor-patient transcription by training the Whisper Tiny model to translate…
基于 SenseVoice 的 Windows 本地语音转文字工具,支持 OpenAI 格式 API 润色,低延迟,高精度。
A sophisticated Node.js application that analyzes YouTube videos for legal compliance. It transcribes the…
Package with sinapsis templates to support OpenAI functionality
A highly contextualized retrieval system integrating Large Language Models (LLMs), embeddings, and a dynamic…
AI Agent plugin for Google NotebookLM (Claude, OpenClaw, etc) — generate slide decks, audio overviews…
Give your agents real time desktop perception. Stream screen, microphone, and system audio for live context…
Explore AI Capabilities for Your .NET Projects with OpenAI's API: Unlock the power of AI in your applications
A set of jupyter notebooks
Speakscribe is a web application that allows users to transcribe audios using OpenAI and also interact with a…
Unlock AI power with AudioInsightsGenerator! From audio to summaries, emotion analysis, idea generation…
🎙️ Fast CLI tool to transcribe audio/video files to SRT format using OpenAI Whisper API
Flutter app with implementation of openAI tools (ChatGPT & Whisper)
A beautiful, native macOS desktop application for transcribing audio and video files using whisper.cpp
A Toolbox Platform for Creating Your Own Tools. Bake Them with Code or AI.
Shell scripts for automated transcription on macOS: Integrates whisper.cpp with QuickTime Player and…
The Hugging Face API wrapper for Delphi leverages cutting-edge models to deliver powerful features, including…
Eolian is a Discord music bot which provide a very powerful API for queuing songs from a variety of sources…
Multi-Modal RAG for .NET — query databases, documents, images and audio in natural language. Production-ready…
Text to Speech Studio to convert text into natural-sounding speech using advanced AI models from leading…
This is a Telegram bot that can download audio from YouTube videos and summarize the content using OpenAI's…
A FastAPI application that relays client WebSocket connections to OpenAI's Realtime API, enabling seamless…
Your personal AI 「KEEP」, support docx, pdf, audio, video...
Leveraging OpenAI's Whisper ASR and GPT-4 models to automate the process of generating meeting minutes from…
An Alexa skill providing a conversational interface to any public figure (as mimicked by GPT3). The legacy…
Live translation tool utilizing OpenAI's Whisper model for real-time audio transcription/translation with…
A simple JavaScript chatbot
Scribe is a Python script that transcribes audio and video files using OpenAI Whisper and exports the…
🤖 A WhatsApp bot to transcribe and summarize audio messages.
Text-to-speech plugin for Claude Code — multi-provider support (ElevenLabs, OpenAI, Google, Amazon Polly…
Audio transcription UI for OpenAI Whisper, GPT4o Transcribe and AssemblyAI APIs
Genius-SaaS: An AI-powered SaaS application built with Next.js and React for personalized recommendations…
Google Gemini live voice to text realtime stream in the browser
end-to-end fullstack and real-time discord clone, all with servers, channels, video calls, audio calls…
An app that uses Hugging Face AI models together with OpenAI & LangChain, to generate text from an image…
Real-time conversation assistant with dual audio transcription and GPT-powered responses, perfect for…
Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning
Portable Claude Code Setup
ChatAnyFile is a powerful full-stack application that allows you to interact with your PDF documents, images…
When an audio message is received, the bot downloads the audio file, converts it to a numpy array, loads the…
eShopLite - Semantic Search is a reference .NET application implementing an eCommerce site with Search…
Voice-driven AI professional agent. Real-time conversations powered by Gemini Live API, native audio…
SpeakingAI is a demo of privately deployable 'GPT-4o like AI + RAG', a fully functional web AI server with…
An end-to-end AI agent project that transcribes audio files, embeds user queries, and searches in Qdrant and…
Convert images into captivating audio stories using image-to-text, language models, and text-to-speech…
#3 Winner of Best Use of Zoom API at Stanford TreeHacks 2025! An AI-powered meeting assistant that captures…
AI Voice Agents: Exploring the Next Generation of Human-Machine Interaction! 🎙️🤖🎧
This project is a multi-modal AI voice assistant that uses LM Studio, OpenAI API or Claude Code, audio…
A powerful and versatile AI-powered PBX for Asterisk, WhatsApp, Telegram, with text and audio support, built…
FastAPI + Whisper + Ollama: Audio transcription and LLM processing API. Convert speech to text with OpenAI…
Convert articles to audio using OpenAI's Text to Speech API via a python script or web app
Set of abstraction libraries to easily build Text and Audio based bots