Optimized Whisper models for streaming and on-device use
$ npx skills add TheStageAI/TheWhisperAlternatives
Compare similar skills by workflow fit, trust score, quality, GitHub adoption, maintenance, and install readiness.
Current skill
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Optimized Whisper models for streaming and on-device use
$ npx skills add TheStageAI/TheWhisperOn-device, real-time multimodal AI. Have natural voice and vision conversations with an AI that runs entirely on your machine. Powered by Gemma 4 E2B and Kokoro.
$ npx skills add fikrikarim/parlorGradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
$ npx skills add abus-aikorea/voice-proAI Vtuber for Streaming on Youtube/Twitch
$ npx skills add ardha27/AI-Waifu-VtuberWhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
$ npx skills add m-bain/whisperXFaster Whisper transcription with CTranslate2
$ npx skills add SYSTRAN/faster-whisperA PyTorch-based Speech Toolkit
$ npx skills add speechbrain/speechbrainSpeech recognition module for Python, supporting several engines and APIs, online and offline.
$ npx skills add Uberi/speech_recognition💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
$ npx skills add coqui-ai/open-speech-corporaCustom nodes that extend the capabilities of Comfyui
$ npx skills add AlekPet/ComfyUI_Custom_Nodes_AlekPetVoice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
$ npx skills add jianchang512/sttSynchronized Translation for Videos. Video dubbing
$ npx skills add R3gm/SoniTranslatePort of OpenAI's Whisper model in C/C++
$ npx skills add ggml-org/whisper.cppFine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.
$ npx skills add ARahim3/mlx-tuneOpen source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS
$ npx skills add toverainc/willow-inference-serverA voice chat app
$ npx skills add modal-labs/quillmanHow to choose
Use an alternative when it has a clearer install path, higher trust score, fresher maintenance, or better platform fit for your current agent stack. Keep Mlx Audio if it already passes your workflow test and repository review.
Next step
Open the compare page, test the install commands in a sandbox, and check each repository before using a skill in production.