A Generative Flow for Text-to-Speech via Monotonic Alignment Search
$ npx skills add jaywalnut310/glow-ttsAlternatives
Compare similar skills by workflow fit, trust score, quality, GitHub adoption, maintenance, and install readiness.
Current skill
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
A Generative Flow for Text-to-Speech via Monotonic Alignment Search
$ npx skills add jaywalnut310/glow-ttsA Pytorch Implementation of "Neural Speech Synthesis with Transformer Network"
$ npx skills add soobinseo/Transformer-TTSunofficial vits2-TTS implementation in pytorch
$ npx skills add p0p4k/vits2_pytorch🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
$ npx skills add huggingface/diffusersImplementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
$ npx skills add lucidrains/e2-tts-pytorchDescriptive Deep Learning
$ npx skills add deepgram/kurVITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
$ npx skills add daniilrobnikov/vits2[AutoArk] GPA (General Purpose Audio) can do ASR, TTS and voice conversion with one tiny model!
$ npx skills add AutoArk/GPAImplementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
$ npx skills add lucidrains/voicebox-pytorchAI-powered multi-voice audiobook generator — LLM script annotation, voice cloning, voice design, LoRA training, per-line style control, and export to MP3, chaptered M4B, or Audacity multi-track. Built on Qwen3-TTS.
$ npx skills add Finrandojin/alexandria-audiobookTurn PDFs and EPUBs into audiobooks; subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, including voice-cloning (instant, RVC-enhanced, XTTS fine-tuning) and LLM processing. It aspires to be a user-friendly app with a GUI, an installer and all-in-one packages.
$ npx skills add lukaszliniewicz/PandratorOpen source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS
$ npx skills add toverainc/willow-inference-serverReal-time voice assistant — WebRTC streaming, faster-whisper ASR, local LLM, Vui Nano (300M) TTS. OpenAI Realtime API compatible. Voice cloning, barge-in, ~9× realtime on a 4090. Apache 2.0.
$ npx skills add fluxions-ai/vuiFireRed-Image-Edit is a powerful image editing foundation model achieving open-source state-of-the-art performance with precise instruction following, high-fidelity generation, superior identity consistency, and seamless multi-element fusion.
$ npx skills add FireRedTeam/FireRed-Image-EditPytorch实现的流式与非流式的自动语音识别框架,同时兼容在线和离线识别,目前支持Conformer、Squeezeformer、DeepSpeech2模型,支持多种数据增强方法。
$ npx skills add yeyupiaoling/MASRLocal, OpenAI-compatible text-to-speech (TTS) API using Chatterbox, enabling users to generate voice cloned speech anywhere the OpenAI API is used (e.g. Open WebUI, AnythingLLM, etc.)
$ npx skills add travisvn/chatterbox-tts-apiHow to choose
Use an alternative when it has a clearer install path, higher trust score, fresher maintenance, or better platform fit for your current agent stack. Keep Diffwave if it already passes your workflow test and repository review.
Next step
Open the compare page, test the install commands in a sandbox, and check each repository before using a skill in production.