Audio agents

1,077 Audio AI agents indexed on MeshKore — the most complete public catalog, ranked by popularity and updated daily.

1,077 agents · ranked by popularity · refine in the directory →

Top 100 Audio agents

A generative speech model for daily dialogue.

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

leon★ 17,266

🧠 Leon is your open-source personal assistant.

ten-framework★ 10,613

Open-source framework for conversational voice AI agents

moonshine★ 8,268

Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces

Vision-Agents★ 7,849

Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.

wukong-robot★ 7,119

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

OnlySwitch★ 5,678

⚙️ All-in-One menu bar app, hide 💻MacBook Pro's notch, dark mode, AirPods, Shortcuts

Red-DiscordBot★ 5,543

A multi-function Discord bot

cactus★ 5,239

Low-latency AI engine for mobile devices & wearables

cheetah★ 4,263

Mac app for crushing tech interviews with AI

awesome-bots★ 4,141

The most awesome list about bots ⭐️🤖

auto-subs★ 3,450

Instantly generate AI-powered subtitles on your device. Works standalone or connects to DaVinci Resolve.

SimpleMem★ 3,435

SimpleMem: Efficient Lifelong Memory for LLM Agents — Text & Multimodal

openwhispr★ 3,393

Voice-to-text dictation app with local (Nvidia Parakeet/Whisper) and cloud models (BYOK). Privacy-first and available cross-platform.

faster-whisper-GUI★ 2,957

faster_whisper GUI with PySide6

amurex★ 2,827

World's first AI meeting copilot → The Invisible Companion for Work + Life

speechgpt★ 2,755

💬 SpeechGPT is a web application that enables you to converse with ChatGPT.

polyglot★ 2,590

🤖️ Cross-platform AI language practice app （跨平台AI语言练习应用）

rasa_core★ 2,342

Rasa Core is now part of the Rasa repo: An open source machine learning framework to automate text-and voice-based conversations

VisionClaw★ 2,324

Real-time AI assistant for Meta Ray-Ban smart glasses -- voice + vision + agentic actions via Gemini Live and OpenClaw

awesome-whisper★ 2,309

🔊 Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI

comfyui_LLM_party★ 2,258

LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAG

ui★ 2,237

ElevenLabs UI is a component library and custom registry built on top of shadcn/ui to help you build multimodal agents faster.

baresip★ 2,110

Baresip is a modular SIP User-Agent with audio and video support

epub_to_audiobook★ 1,984

EPUB to audiobook converter, optimized for Audiobookshelf, WebUI included

pluely★ 1,976

The Open Source Alternative to Cluely - A lightning-fast, privacy-first AI assistant that works seamlessly during meetings, interviews, and conversations without anyone knowing. Built with Tauri for native performance, just 10MB. Completely undetectable in video calls, screen shares, and recordings.

Dot★ 1,911

Text-To-Speech, RAG, and LLMs. All local!

react-simple-chatbot★ 1,756

:speech_balloon: Easy way to create conversation chats

ElatoAI★ 1,756

Realtime Voice AI with 100+ Models on Arduino ESP32 with Secure Websockets and Edge Functions for AI Toys, Companions, and Devices

bailing★ 1,701

百聆是一个类似GPT-4o的语音对话机器人，通过ASR+LLM+TTS实现，集成DeepSeek R1等优秀大模型，接入openClaw，真正的个人语音助手，时延低至800ms，Mac等低配置也可运行，支持打断

RCLI★ 1,514

Talk to your Mac, query your docs, no cloud required. On-device voice AI + RAG

yt-whisper★ 1,439

Using OpenAI's Whisper to automatically generate YouTube subtitles

Dragonfire★ 1,409

the open-source virtual assistant for Ubuntu based Linux distributions

langchain4j-aideepin★ 1,288

基于AI的工作效率提升工具（聊天、绘画、知识库、工作流、 MCP服务市场、语音输入输出、长期记忆） | Ai-based productivity tools (Chat,Draw,RAG,Workflow,MCP marketplace, ASR,TTS, Long-term memory etc)

telegram-chatgpt-concierge-bot★ 1,131

Interact with OpenAI's ChatGPT via Telegram and Voice.

lotti★ 1,114

Open-source private logbook with a local agentic layer. Long-living AI agents read what you record and propose what to do next. Hardware permitting, the models runs locally too. Matrix + Vodozemac for end-to-end encrypted sync between your own devices.

AI-Waifu-Vtuber★ 1,078

AI Vtuber for Streaming on Youtube/Twitch

AVA-AI-Voice-Agent-for-Asterisk★ 1,045

An open-source AI Voice Agent that integrates with Asterisk/FreePBX using Audiosocket/RTP technology

Whisperboard★ 1,032

The open-source iOS app that's making quality voice transcription more accessible on mobile devices.

realtime-phone-agents-course★ 977

Build realtime AI voice agents using FastRTC for low-latency streaming, Superlinked for vector search, Twilio for live phone calls, and Runpod for scalable GPU deployment.

lobe-vidol★ 954

🧸 Lobe Vidol - Making Virtual Idols Accessible for EveryOne

voquill★ 947

Open source voice dictation technology

blurr★ 922

This app can now use Android, just like a human.

agent-starter-react★ 873

A complete voice AI frontend app for LiveKit Agents with Next.js

local-talking-llm★ 852

A talking LLM that runs on your own computer without needing the internet.

esp-ai★ 831

The simplest and lowest-cost AI integration solution. If you like this project, please give it a Star~ | 最简单、最低成本的AI接入方案。喜欢本项目的话点个 Star 吧~

aws-lex-web-ui★ 822

Sample Amazon Lex chat bot web interface

openclaw-nerve★ 821

Real-time web cockpit for OpenClaw: voice conversations, agent automated kanban board, workspace/file control, sub-agent sessions, inline charts, and usage visibility.

gitpodcast★ 807

Convert any git repository into an engaging podcast

june★ 784

Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit

whisper.rn★ 781

React Native binding of whisper.cpp.

SwiftWhisper★ 779

🎤 The easiest way to transcribe audio in Swift

viral-clips-crew★ 756

Your CrewAI Powered Video Editing Assistant

whisper.unity★ 729

Running speech to text model (whisper.cpp) in Unity3d on your local machine.

LocalAIVoiceChat★ 721

Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis.

VideoAgent★ 714

"VideoAgent: All-in-One Agentic Framework for Video Understanding, Editing, and Remaking"

BabelDuck★ 681

Beginner-friendly AI conversation practice application

voice-assistant-scripts★ 681

Example scripts for AI agents created with the Alan AI Platform.

bolna★ 654

Conversational voice AI agents

speech-to-text★ 615

Real-time transcription using faster-whisper

stealth★ 597

An open source Ruby framework for text and voice chatbots. 🤖

VLog★ 588

[CVPR 2025] Video Narration as Vocabulary & Video as Long Document

echokit_server★ 565

Open Source Voice Agent Platform

Starmoon★ 546

A conversational, AI device + software framework for companionship, entertainment, education, healthcare, IoT applications, and DIY robotics. Built with Python, NextJS, Arduino, ESP32, LLMs (GPT-4o), Deepgram STT and Azure TTS 🤖

LLM-Agents-Ecosystem-Handbook★ 524

One-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools.

JARVIS★ 523

Your own personal voice assistant: Voice to Text to LLM to Speech, displayed in a web interface

joinly★ 519

Make your meetings accessible to AI Agents

ollama-voice-mac★ 517

Mac compatible Ollama Voice

Facemoji★ 454

😆 A voice chatbot that can imitate your expression. OpenCV+Dlib+Live2D+Moments Recorder+Turing Robot+Iflytek IAT+Iflytek TTS

okcash★ 434

OK | Every voice, every meme, every transaction makes $OK stronger and more vibrant. Powered by all of us—and now, AI agents. OK is not just OK — it’s $OK. $OK?

react-voice-agent★ 431

smol-podcaster★ 414

smol-podcaster is your podcast production agent 🎙️

project-raven★ 403

Open-source AI meeting copilot - real-time transcription, echo cancellation, and AI assistance. Captures system audio + mic, cancels echo via WebRTC AEC3, transcribes with Deepgram, and gives you Claude/OpenAI help during meetings. Runs locally on macOS and Windows.

visionOS-examples★ 400

visionOS examples ⸺ Spatial Computing Accelerators for Apple Vision Pro

Stream-Omni★ 386

Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

Whisper-transcription_and_diarization-speaker-identification-★ 377

How to use OpenAIs Whisper to transcribe and diarize audio files

edgen★ 372

⚡ Edgen: Local, private GenAI server alternative to OpenAI. No GPU required. Run AI models locally: LLMs (Llama2, Mistral, Mixtral...), Speech-to-text (whisper) and many others.

adk-rust★ 350

Rust Agent Development Kit (ADK-Rust): Build AI agents in Rust with modular components for models, tools, memory, realtime voice, and more. ADK-Rust is a flexible framework for developing AI agents with simplicity and power. Model-agnostic, deployment-agnostic, optimized for frontier AI models. Includes support for real-time voice agents.

say★ 348

say - command line tool for voice and video calling

maxheadbox★ 343

Tiny truly local voice-activated LLM Agent that runs on a Raspberry Pi

macos-local-voice-agents★ 323

Pipecat voice AI agents running locally on macOS

tiledesk-dashboard★ 318

Tiledesk is the open source AI agent builder, written in Node.js and Angular. This repository is dedicated to the WebApp dashboard to manage Tiledesk: open-source alternative to Voiceflow, enabling easy creation of advanced LLM-powered Agents with seamless human-in-the-loop (HITL).

jarvis★ 318

Jarvis is a voice-activated, conversational AI assistant powered by a local LLM (Qwen via Ollama). It listens for a wake word, processes spoken commands using a local language model with LangChain, and responds out loud via TTS. It supports tool-calling for dynamic functions like checking the current time.

twewy-discord-chatbot★ 315

Discord AI Chatbot using DialoGPT, trained on the game transcript of The World Ends With You

hack-interview★ 310

AI-powered tool for real-time interview question transcription and response generation.

RuntimeSpeechRecognizer★ 306

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

gpt-voice-conversation-chatbot★ 302

Allows you to have an engaging and safely emotive spoken / CLI conversation with the AI ChatGPT / GPT-4 while giving you the option to let it remember things discussed.

whisper-node★ 302

Node.js bindings for OpenAI's Whisper. (C++ CPU version by ggerganov)

AI-Talks★ 296

AI Talks - ChatGPT Assistant via Streamlit

tiledesk★ 296

Install Tiledesk on your server using Helm for Kubernetes orchestration and Docker Compose for running multi-container Docker applications. Tiledesk provides an open-source solution comparable to Voiceflow, empowering you to create sophisticated LLM-enabled chatbots that seamlessly transition interactions to human agents when needed.

ai-devices★ 295

AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more

firefox-voice★ 292

Firefox Voice is an experiment in a voice-controlled web user agent

TranscriberBot★ 291

TranscriberBot for Telegram

tetos★ 279

A unified interface for multiple Text-to-Speech (TTS) providers.

safestclaw★ 276

Safestclaw is the alternative to openclaw.. You can naturally chat with it via text and voice, and you can choose not to use a language model., By default it picks up on intent and semantics.. No prompt injection while you get over ninety percent of what openclaw does plus tts and voice to text

voiceai★ 274

Set of 📝 with 🔗 to help those building Voice AI agents 🎙️🤖

aixplora★ 274

AIxplora is a open-source tool which let's you query all kind of files not limited to any length or format.

ai_webui★ 270

AI-WEBUI: A universal web interface for AI creation, 一款好用的图像、音频、视频AI处理工具

Browse other category pages

Code23,874 AI Infra22,308 Data5,097 Business3,050 Image1,832 Content1,286 Personal837 Crypto295 Translation290 Demo5 Infrastructure1