Process rich media

Best multimodal media skills for AI agents

Browse skills for image, video, audio, transcription, metadata extraction, and multimodal content workflows.

Builders choosing skills for transcribe audio and extract video metadata. Ranked from the OpenAgentSkill index using quality, trust, freshness, adoption, and install readiness.

30
Ranked
575K
Stars
100
Top trust

Workflow

Transcribe audio

Workflow

Extract video metadata

Workflow

Summarize images

#1

Notebooks

22 fitTrust 100Excellent 100

A collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like RF-DETR, YOLO11, SAM 3, and Qwen3-VL.

Excellent quality, 9.5K stars, and a 22 use-case fit score.

9.5K starsMay 21, 2026 pushProduction candidateJupyter NotebookMachine Learning
$ npx skills add roboflow/notebooks
#2

X AnyLabeling

22 fitTrust 100Excellent 100

Effortless data labeling with AI support from Segment Anything and other awesome models.

Excellent quality, 9.4K stars, and a 22 use-case fit score.

9.4K starsJun 6, 2026 pushProduction candidatePythonMachine Learning
$ npx skills add CVHub520/X-AnyLabeling
#3

PaddleOCR

22 fitTrust 100Excellent 100

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

Excellent quality, 82K stars, and a 22 use-case fit score.

82K starsJun 12, 2026 pushProduction candidatePythonRAG
$ npx skills add PaddlePaddle/PaddleOCR
#4

Graphify

21 fitTrust 100Excellent 100

AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.

Excellent quality, 66K stars, and a 21 use-case fit score.

66K starsJun 11, 2026 pushProduction candidatePythonClaude Code
$ npx skills add safishamsi/graphify
#5

Open Design

21 fitTrust 100Excellent 100

🎨 Local-first, open-source Claude Design alternative. 🖥️ Native desktop app. ⚡ 259+ Skills · ✨ 142+ Design Systems 🖼️ Web · desktop · mobile prototypes · slides · images · videos · HyperFrames 📦 Sandboxed preview · HTML/PDF/PPTX/MP4 export 🤖 Claude Code / OpenClaw / Codex / Cursor / OpenCode / Qwen / Copilot / Hermes / Kimi & 17+ CLIs.

Excellent quality, 64K stars, and a 21 use-case fit score.

64K starsJun 13, 2026 pushProduction candidateTypeScriptAI Agents
$ npx skills add nexu-io/open-design
#6

Comic Translate

21 fitTrust 100Excellent 100

AI comic and manga translator app/browser extension for automatically translating comics, manga, manhwa, BDs, fumetti, and more in multiple languages and formats (Images, PDF, EPUB, CBR, CBZ etc).

Excellent quality, 2.8K stars, and a 21 use-case fit score.

2.8K starsJun 6, 2026 pushProduction candidatePythonOCR
$ npx skills add ogkalu2/comic-translate
#7

ShareX

21 fitTrust 100Excellent 100

ShareX is a free and open-source application that enables users to capture or record any area of their screen with a single keystroke. It also supports uploading images, text, and various file types to a wide range of destinations.

Excellent quality, 38K stars, and a 21 use-case fit score.

38K starsJun 11, 2026 pushProduction candidateC#OCR
$ npx skills add ShareX/ShareX
#8

SCrawler

21 fitTrust 100Excellent 100

🏳️‍🌈 Media downloader from any sites, including Twitter, Reddit, Instagram, BlueSky, TikTok, Threads, Facebook, OnlyFans, YouTube, Pinterest, PornHub, XHamster, XVIDEOS, ThisVid etc.

Excellent quality, 2.1K stars, and a 21 use-case fit score.

2.1K starsJun 1, 2026 pushProduction candidateVisual Basic .NETCrawler
$ npx skills add AAndyProgram/SCrawler
#9

OCRmyPDF

21 fitTrust 100Excellent 100

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Excellent quality, 34K stars, and a 21 use-case fit score.

34K starsJun 12, 2026 pushProduction candidatePythonPDF
$ npx skills add ocrmypdf/OCRmyPDF
#10

Ppt Master

21 fitTrust 100Excellent 100

AI generates a real, editable PowerPoint from any document — native shapes & animations, speaker notes voiced as audio narration, and the option to follow your own .pptx template, not slide images · by Hugo He

Excellent quality, 27K stars, and a 21 use-case fit score.

27K starsJun 10, 2026 pushProduction candidatePythonAI Agents
$ npx skills add hugohe3/ppt-master
#11

ImageToolbox

20 fitTrust 100Excellent 100

🖼️ Image Toolbox is a powerful app for advanced image manipulation. It offers dozens of features, from basic tools like crop and draw to filters, OCR, and a wide range of image processing options

Excellent quality, 13K stars, and a 20 use-case fit score.

13K starsJun 11, 2026 pushProduction candidateKotlinPDF
$ npx skills add T8RIN/ImageToolbox
#12

Meshroom

20 fitTrust 100Excellent 100

Node-based Visual Programming Toolbox

Excellent quality, 13K stars, and a 20 use-case fit score.

13K starsJun 12, 2026 pushProduction candidatePythonWorkflow
$ npx skills add alicevision/Meshroom
#13

Instaloader

20 fitTrust 100Excellent 100

Download pictures (or videos) along with their captions and other metadata from Instagram.

Excellent quality, 13K stars, and a 20 use-case fit score.

13K starsApr 15, 2026 pushProduction candidatePythonOSINT
$ npx skills add instaloader/instaloader
#14

Lancedb

20 fitTrust 100Excellent 100

Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.

Excellent quality, 11K stars, and a 20 use-case fit score.

11K starsJun 12, 2026 pushProduction candidateHTMLSemantic Search
$ npx skills add lancedb/lancedb
#15

Manga Image Translator

20 fitTrust 100Excellent 100

Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)

Excellent quality, 10K stars, and a 20 use-case fit score.

10K starsMay 24, 2026 pushProduction candidatePythonOCR
$ npx skills add zyddnys/manga-image-translator
#16

Pyod

19 fitTrust 100Excellent 100

A Python library for anomaly detection across tabular, time series, graph, text, and image data. 60+ detectors, benchmark-backed ADEngine orchestration, and an agentic workflow for AI agents.

Excellent quality, 9.9K stars, and a 19 use-case fit score.

9.9K starsJun 5, 2026 pushProduction candidatePythonMachine Learning
$ npx skills add yzhao062/pyod
#17

Gorse

19 fitTrust 100Excellent 100

AI powered open source recommender system engine supports classical/LLM rankers and multimodal content via embedding

Excellent quality, 9.7K stars, and a 19 use-case fit score.

9.7K starsJun 12, 2026 pushProduction candidateGoMachine Learning
$ npx skills add gorse-io/gorse
#18

Inference

19 fitTrust 100Excellent 100

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

Excellent quality, 9.3K stars, and a 19 use-case fit score.

9.3K starsJun 12, 2026 pushProduction candidatePythonMachine Learning
$ npx skills add xorbitsai/inference
#19

Deeplake

19 fitTrust 100Excellent 100

Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.

Excellent quality, 9.2K stars, and a 19 use-case fit score.

9.2K starsMay 21, 2026 pushProduction candidateC++RAG
$ npx skills add activeloopai/deeplake
#20

Vision Agents

19 fitTrust 100Excellent 100

Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.

Excellent quality, 7.9K stars, and a 19 use-case fit score.

7.9K starsJun 11, 2026 pushProduction candidatePythonAI Agents
$ npx skills add GetStream/Vision-Agents
#21

Dolphin

19 fitTrust 100Excellent 100

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Excellent quality, 9.0K stars, and a 19 use-case fit score.

9.0K starsMar 25, 2026 pushProduction candidatePythonPDF
$ npx skills add bytedance/Dolphin
#22

Video Subtitle Extractor

19 fitTrust 100Excellent 100

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

Excellent quality, 9.0K stars, and a 19 use-case fit score.

9.0K starsApr 9, 2026 pushProduction candidatePythonOCR
$ npx skills add YaoFANGUK/video-subtitle-extractor
#23

Lance

19 fitTrust 100Excellent 100

Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

Excellent quality, 6.6K stars, and a 19 use-case fit score.

6.6K starsJun 12, 2026 pushProduction candidateRustData Analysis
$ npx skills add lance-format/lance
#24

ESearch

19 fitTrust 100Excellent 100

截屏 离线OCR 搜索翻译 以图搜图 贴图 录屏 万向滚动截屏 屏幕翻译 Screenshot Offline OCR Search Translate Search for picture Paste the picture on the screen Screen recorder Omnidirectional scrolling screenshot Screen translator 支持Windows Linux macOS

Excellent quality, 6.4K stars, and a 19 use-case fit score.

6.4K starsJun 8, 2026 pushProduction candidateTypeScriptOCR
$ npx skills add xushengfeng/eSearch
#25

Smile

19 fitTrust 100Excellent 100

Statistical Machine Intelligence & Learning Engine

Excellent quality, 6.4K stars, and a 19 use-case fit score.

6.4K starsJun 12, 2026 pushProduction candidateJavaMachine Learning
$ npx skills add haifengl/smile
#26

GLM OCR

19 fitTrust 100Excellent 100

GLM-OCR: Accurate × Fast × Comprehensive

Excellent quality, 6.9K stars, and a 19 use-case fit score.

6.9K starsApr 21, 2026 pushProduction candidatePythonOCR
$ npx skills add zai-org/GLM-OCR
#27

Parsr

19 fitTrust 100Excellent 100

Transforms PDF, Documents and Images into Enriched Structured Data

Excellent quality, 6.2K stars, and a 19 use-case fit score.

6.2K starsMar 20, 2026 pushProduction candidateJavaScriptPDF
$ npx skills add axa-group/Parsr
#28

Grobid

19 fitTrust 100Excellent 100

A machine learning software for extracting information from scholarly documents

Excellent quality, 4.9K stars, and a 19 use-case fit score.

4.9K starsJun 13, 2026 pushProduction candidateJavaMachine Learning
$ npx skills add grobidOrg/grobid
#29

BallonsTranslator

19 fitTrust 100Excellent 100

深度学习辅助漫画翻译工具, 支持一键机翻和简单的图像/文本编辑 | Yet another computer-aided comic/manga translation tool powered by deeplearning

Excellent quality, 4.8K stars, and a 19 use-case fit score.

4.8K starsJun 12, 2026 pushProduction candidatePythonOCR
$ npx skills add dmMaze/BallonsTranslator
#30

Tesseract

19 fitTrust 100Excellent 100

Tesseract Open Source OCR Engine (main repository)

Excellent quality, 75K stars, and a 19 use-case fit score.

75K starsJun 4, 2026 pushProduction candidateC++OCR
$ npx skills add tesseract-ocr/tesseract

Selection method

How this list is ranked

OpenAgentSkill scores each candidate against the workflow keywords, then balances fit with GitHub stars, quality signals, trust profile, maintenance freshness, and whether there is a clear install path.

How does OpenAgentSkill rank multimodal media?

The ranking combines workflow fit, quality score, trust profile, GitHub adoption, maintenance freshness, and whether a clear install path exists.

Should I install the top skill immediately?

No. Treat the list as a shortlist, open the skill detail page, inspect the repository and license, then test the install command in a sandbox workflow.

Can my agent consume this ranking through an API?

Yes. Use /api/skills/search with the related task or /api/agent/rankings?slug=best-multimodal-media-skills to fetch ranked skill data.