capability
Speech agents
This page lists every AI agent in the MeshKore directory tagged with the Speech capability. Agents are sourced from public platforms (GitHub, Hugging Face, npm, PyPI, awesome-list curations, and direct submissions), normalized by the MeshKore worker, and ranked by GitHub stars. Each card links to the agent's profile with details on capabilities, framework, language, freshness, and source attribution.
522 agents in this capability · ranked by popularity
Top 200 Speech agents
A generative speech model for daily dialogue.
Faster Whisper transcription with CTranslate2
🧠 Leon is your open-source personal assistant.
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative…
Machine Learning and Agentic AI Resources, Practice and Research
Build local voice agents with open-source models
A nearly-live implementation of OpenAI's Whisper.
Instantly generate AI-powered subtitles on your device. Works standalone or connects to DaVinci Resolve.
Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.
💬 SpeechGPT is a web application that enables you to converse with ChatGPT.
Voice-to-text dictation app with local (Nvidia Parakeet/Whisper) and cloud models (BYOK). Privacy-first and…
🔊 Awesome list for Whisper — an open-source AI-powered speech recognition system developed by OpenAI
The Self-Coding System for Your App — Alan AI SDK for iOS
Gathers machine learning and Tensorflow deep learning models for NLP problems, 1.13 < Tensorflow < 2.0
Free, high-quality text-to-speech API endpoint to replace OpenAI, Azure, or ElevenLabs
The Open Source Alternative to Cluely - A lightning-fast, privacy-first AI assistant that works seamlessly…
The Self-Coding System for Your App — Alan AI SDK for Flutter
:speech_balloon: Easy way to create conversation chats
The Self-Coding System for Your App — Alan AI SDK for Ionic
Realtime Voice AI on Arduino ESP32 with OpenAI Realtime, Gemini, Grok, Eleven Labs with >15 minutes…
the open-source virtual assistant for Ubuntu based Linux distributions
The Self-Coding System for Your App — Alan AI SDK for Cordova
AI Vtuber for Streaming on Youtube/Twitch
💬📝 A small dictation app using OpenAI's Whisper speech recognition model.
The open-source iOS app that's making quality voice transcription more accessible on mobile devices.
Example apps for Foundation Models Framework in iOS 26 and macOS 26
Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and…
The open source wisprflow alternative
A talking LLM that runs on your own computer without needing the internet.
Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/
Live speech translation powered by on-device AI and cloud providers — OpenAI, Google Gemini, Palabra.ai…
React hook for OpenAI Whisper with speech recorder, real-time transcription, and silence removal built-in
🎤 The easiest way to transcribe audio in Swift
React Native binding of whisper.cpp.
TTSFM mirrors OpenAI's TTS service, providing a compatible interface for text-to-speech conversion with…
Running speech to text model (whisper.cpp) in Unity3d on your local machine.
Beginner-friendly AI conversation practice application
Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android
ChatGPT at home! A better alternative to commercial smart home assistants, built on the Raspberry Pi using…
Real-time transcription using faster-whisper
The Self-Coding System for Your App — Alan AI SDK for React Native
Local, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate voice cloned…
Mac compatible Ollama Voice
A Conversational Assistant equipped with synthetic voices including J.A.R.V.I.S's. Powered by OpenAI and IBM…
The Self-Coding System for Your App — Alan AI SDK for Power Apps
Stream-Omni is a GPT-4o-like language-vision-speech chatbot that simultaneously supports interaction across…
potato: the portable annotation tool
⚡ Edgen: Local, private GenAI server alternative to OpenAI. No GPU required. Run AI models locally: LLMs…
An API to transcribe audio with OpenAI's Whisper Large v3!
Open Source Voice Agent Platform
Webscout is the all-in-one search and AI toolkit you need. Discover insights with Yep.com, DuckDuckGo, and…
A powerful Rust library and CLI tool to unify and orchestrate multiple LLM, Agent and voice backends (OpenAI…
Simple self-hosted web application, which can be used to convert audio to subtitles by OpenAI's Whisper model
🎩 An Alfred 5 Workflow for using OpenAI Chat API to interact with GPT models 🤖💬 It also allows image…
This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice…
Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI…
Allows you to have an engaging and safely emotive spoken / CLI conversation with the AI ChatGPT / GPT-4 while…
A unified interface for multiple Text-to-Speech (TTS) providers.
World's First Multilingual Inexpensive Therapeutic Sophisticated Ultra-responsive Holographic Agent. In…
AI-WEBUI: A universal web interface for AI creation, 一款好用的图像、音频、视频AI处理工具
DB-GPT WebUI,LLM to vision.
The main repo for Stage Whisper — a free, secure, and easy-to-use transcription app for journalists, powered…
Open-source AI meeting copilot - real-time transcription, echo cancellation, and AI assistance. Captures…
:speech_balloon: Easy way to create conversation chats
A web UI Project In order to learn the large language model. This project includes features such as chat…
Documentation and Wiki for SEPIA. Please post your questions and bug-reports here in the issues section…
Self-hosted OpenClaw gateway + agent runtime in .NET (NativeAOT-friendly)
OpenClaw voice assistant app for Android - Wake word activation & system assistant integration
🎙️ AI generated subtitles and segmented chapters for podcasts
🤖 A Matrix bot for using different capabilities (text-generation, text-to-speech, speech-to-text…
Amazon Sumerian Hosts (Hosts) is an experimental open source project that aims to make it easy to create…
NodeJS Bindings for Whisper - the CPU version of OpenAI's Whisper, as initially crafted in C++ by ggerganov.
Full stack voice chatbot
An Android ChatBot powered by Watson Services - Assistant, Speech-to-Text and Text-to-Speech on IBM Cloud.
A voice-enabled chatbot application built using of 🦜️🔗 LangChain, text-to-speech, and speech-to-text models…
A powerful Whisper AI keyboard for reliable speech transcription
Voice native AI agent for the builders of tomorrow
Samantha OS1 is a conversational AI assistant powered by the Realtime API from OpenAI
Custom TTS component for Home Assistant. Utilizes the OpenAI speech engine or any compatible endpoint to…
Voice-activated AI assistant with speech recognition and NLP. Automate tasks effortlessly with this…
🗣️ ZAI/GLM TTS to OpenAI Speech API, 免费的语音合成API,支持克隆音色,基于智谱TTS
Input text from speech in any Linux window, the lean, fast and accurate way, using whisper.cpp OFFLINE. Speak…
CLI tool for running text through OpenAI Text to speech
Sayna is a unified Voice Layer for AI Agents with a seemless integration to an existing agentic frameworks
Official one-stop shop for AI Agents and developers building with Telnyx.
OpenAI's Whisper Audio to text transcription right into your web browser! An open source AI subtitling suite.
Like ChatGPT's voice conversations with an AI, but entirely offline/private/trade-secret-friendly, using…
A full stack app for interruptible, low-latency and near-human quality AI phone calls built from stitching…
WhiteLightning distills massive, state-of-the-art language models into lightweight, hyper-efficient text…
WhisperClip simplifies your life by automatically transcribing audio recordings and saving the text directly…
It is a personal assistant chatbot, capable to perform many tasks same as Google Assistant plus more extra…
Input a YouTube video link or upload a video file and get a video with subtitles.
The video search layer for AI agents. Search video by meaning — across speech, visuals, and on-screen text.
A simple API to integrate chatbots written in Javascript with WhatsApp Web :speech_balloon::calling: (Store…
一个基于Indextts和Qwen3TTS的 AI 有声书制作工具。利用 LLM 自动拆解剧本与识别情绪,集成多角色 TTS…
PodAgent: A Comprehensive Framework for Podcast Generation
Agent orchestration & security template featuring MCP tool building, agent2agent workflows, mechanistic…
Short code for dictation using OpenAI Whisper for transcription.
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text…
Push to talk voice recognition using Whisper
Full-stack AI chat platform built on Cloudflare using Workers, Durable Objects, KV, and AI Gateway. Features…
LLM based agents with proactive interactions, long-term memory, external tool integration, and local…
A curated list of awesome OpenAI's Whisper
InsightSolver: Colab notebooks for exploring and solving operational issues using deep learning, machine…
A Discord chatbot that supports popular LLMs for text generation and ultra-realistic voices for voice chat.
Speech o Text using docker image with ggerganov/whisper.cpp
The ChatGPT/DeepSeek Voice Assistant uses a Raspberry Pi (or desktop) to enable spoken conversation with…
Transcription and TTS Rest API (OpenAI Whisper, Speechbrain)
A true Artificial Intelligent Assistant with ALICE as backend and offline speech recognition with vosk engine…
🦞 Open-source browser-based voice chat for AI assistants. Self-hosted, private, free. Whisper STT +…
开源人工智能,基于开源软硬件构建语音对话机器人、智能音箱……人机对话、自然交互,来宝拥有无限可能。特别说明,来宝运行于Python 3!
Voice control for ChatGPT. Talk to ChatGPT and hear ChatGPT's responses in a natural voice.
A modern AI chatbot with chat, image generation, and text-to-speech features, designed for a smooth and…
A Clojure library for building real-time voice-enabled AI Agents. Simulflow handles the orchestration of…
🗣️🔊 Your Text-to-Speech Services, All-in-One.
HACS custom integration for using Whisper speech-to-text (OpenAI, GroqCloud or Mistral) API in the Assist…
GPT-3 client for Windows and Unix with memories management that supports both text and speech in any…
openai/whisper + extra features
An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and…
QVAC - Local AI SDK and libraries for building private, cross-platform, peer-to-peer AI applications. Run…
ChatGPT 安卓版 - 私人定制 AI,只需要本地设置 API Key 就可以使用,聊天历史本地存储,如果想体验语音版本可以下载商用版,或是 自己集成 Azure Speech SDK(付费,现有免费额度送)。
Realtime Interview Copilot is a web application that assists users in crafting responses during interviews…
NOVA is a customizable voice assistant made with Node.js.
SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems
AskTube - An AI-powered YouTube video summarizer and QA assistant powered by Retrieval Augmented Generation…
Iron man inspired Personal virtual assistant
Implementation of OpenAI's Text-To-Speech in Unity. Synthesize any text and play it via any AudioSource.
A sample speech transcription app implementing OpenAI Text to Speech API based on Whisper, an automatic…
Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize…
Chrome extension for voice-to-text conversations with ChatGPT using OpenAI Whisper API
AmigaOS 3.1/4.1 and MorphOS application for chatting with ChatGPT or generating images
Whisper is an automatic speech recognition (ASR) system Gradio Web UI Implementation
Real-time speech recognition & AI-powered note-taking app for macOS with offline/online modes, multilingual…
Twitch livestream bot that can control colors for overlays from Stream Elements, play sound effects, handle…
An Android ChatBot powered by IBM Watson Services (Assistant V1, Text-to-Speech, and Speech-to-Text with…
Production-ready audio and video transcription app that can run on your laptop or in the cloud.
svelte component for using the openai realtime api
STT 한글 문장 인식기 출력 스크립트의 외자 오류율(CER), 단어 오류율(WER)을 계산하는 Python 함수 패키지
AI Voice Assistant: Talk to an AI agent that helps you with event scheduling, contact management, accessing…
A minimalistic automatic speech recognition streamlit based webapp powered by OpenAI's Whisper "State of the…
Integrate with the latest language models, image generation, speech, and deep learning frameworks like…
A SpeechToText application that uses OpenAI's whisper via faster-whisper to transcribe audio and send that…
Audio to summary with openAI Whisper & GPT 3.5/4 using streamlit
The Web AI Toolkit is a powerful, privacy-first JavaScript library that brings advanced AI capabilities…
Chatbot in russian with speech recognition using PocketSphinx and speech synthesis using RHVoice. The…
Use OpenAI TTS(Text to Speech) API with Gradio
AgentOS2-Live by OrionStar — an end-to-end real-time voice interaction solution based on the Realtime API. No…
It's like ChatGPT for videos.
The AI Powered Speech Analytics for Amazon Connect solution provides the combination of speech to text…
This chatbot lets you use your microphone to communicate with GPT-4. It uses the OpenAI text to speech to…
This project demonstrates a multi-agent system using Google's Agent Development Kit (ADK), Agent2Agent (A2A)…
Real-time AI ChatBot and voice-enabled AI VoiceBot using Deepgram (STT ↔ TTS) and Groq LLM for natural…
Just an .exe that can be used for those unable to build whisper.cpp in Windows.
一个具有长时记忆和 Live2d 形象的"数字生命" / A digital life with long-term memories and live2d body
Demo for Deepgram Voice Agent API
Streamlit Audio Transcription with OPENAI's Whisper Ai: An interactive Streamlit app demonstrating real-time…
久远:一个开发中的大模型语音助手,当前关注易用性,简单上手,支持对话选择性记忆和Model Context Protocol (MCP)服务。 KUON:A large language model-based…
Unity packages for real-time conversational AI with speech-to-speech capabilities. Integrates OpenAI and…
Chatbot with a 3D avatar that can answer interview questions in your behalf. It can speak and understand…
A self-hosted AI companion web app with anime-style Live2D and VRM characters. Talk with your companion via…
JARVIS AI Assistant 🤖 A virtual assistant project inspired by Tony Stark's JARVIS, powered by speech…
👻 kwami.io | A 3D Interactive AI Companion Library for creating engaging AI companions with visual (blob)…
A Python based Voice Assistant like Siri
Generate an engaging podcast based on your document using Azure OpenAI and Azure Speech.
Revamp your morning routine and supercharge productivity with Dispatch. The ultimate Apple Shortcut powered…
A Whisper + ChatGPT MagicMirror Module.
Voice-powered AI assistant platform — connect any LLM, any TTS, with a live web canvas, music generation, and…
AI Agent capable of automating various tasks using MCP
🎬 AI-powered localhost subtitle generator for hearing-impaired users. Automatic speech recognition using…
A minimalistic automatic speech recognition streamlit based webapp powered by OpenAI's Whisper
Jugalbandi (JB) Manager is a full AI-powered conversational chatbot platform. It's platform agnostic and can…
A bentoML-powered API to transcribe audio and make sense of it
Spotify Web AI DJ - client side agentic smarts using Gemma 2, two billion parameter LLM, to play what a user…
Python platform for working with LLMs
A minimal speech-to-structured output app built with Azure OpenAI Realtime API.
This project aims to combine the latest LLMs, Multi-Step Asynchronous Function Calling, Natural Language…
fine-tune Whipser model for Taiwanese speech recognition
OpenAI TTS Compatible Ukrainian TTS StyleTTS2 Pipeline
Uses OpenAI API to clean pdf then converts it to professional grade audiobook with text to speech.
V.I.S.O.R., my in-development AI-powered voice assistant with integrated memory!
This GitHub repository shows how to integrate openai GPT-3 language model and ChatGPT API into a Unity…
Simple Python audio transcriber using OpenAI's Whisper speech recognition model
macOS menu bar app providing a local HTTP server compatible with the OpenAI Whisper API for fast and private…
A fully local, open-source voice-to-text tool that acts as a system-wide AI dictation layer, converting…
Harness OpenAI's power to effortlessly create YouTube Shorts with this project. Includes tools for generating…
Let's turn ChatGPT in to VoiceGPT (Vue JS, Vite, Open AI, AWS Polly) ChatGPT Clone (kind of lol)
Waifu_AI_Vtuber is a AI virtual YouTuber chatbot powered by OpenAI GPT-3.5, interacting in real-time with…
Text To Speech Demo in ReactJS Application using Azure Avatar AI Service.
A framework for creating voice based agents. Integrations LLMs with speech recognition and text-to-speech
Sky LiveKit Agent Perplexica is a local, free solution integrating LiveKit with advanced internet search. It…
A machine learning powered, voice-based virtual assistant for Raspberry Pi. Supports several features like…
Whisper Speech-to-Text is a JavaScript library for recording and transcribing user audio into text via…
🌌 Explore 255+ essential skills for AI coding assistants like Claude Code and GitHub Copilot to enhance your…
Efficient AI English Learning: Read & Speak via Web | 通过 AI 学英语朗读,对话的高效 Web 应用
A minimalistic web app to generate transciption for audio built using Python
Implementation of OpenAI's Realtime API in Unity. Easily integrate low-latency, multi-modal conversations via…
OpenAI Whisper in Home Assistant via the OpenAI API for use in the Assist pipeline
YATSEE - Yet Another Tool for Speech Extraction & Enrichment