Decision filters

Choose skills by scenario, quality, and trust signals.

4 skills matching "multi-modal"

Best blend of quality, stars, freshness, and agent usage

1

VideoRAG

VERIFIEDEXCELLENT · 89

[KDD'2026] "VideoRAG: Chat with Your Videos"

$ npx skills add HKUDS/VideoRAG
3.0K stars63 qualityClaude Code
High-confidence pick with strong adoption and healthy maintenance signals.
pythonrag
by HKUDSQuick view
2

Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)

$ npx skills add raphael-seo/Versatile-OCR-Program
683 stars54 qualityClaude Code + OpenAI Agents
Solid option that is likely worth shortlisting for production workflows.
pythonocr
by raphael-seoQuick view
3

Awesome GUI Agent

VERIFIEDSTRONG · 79

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

$ npx skills add showlab/Awesome-GUI-Agent
1.2K stars52 qualityClaude Code
Solid option that is likely worth shortlisting for production workflows.
llm
by showlabQuick view
4

WindowsAgentArena

STRONG · 81

Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.

$ npx skills add microsoft/WindowsAgentArena
861 stars51 qualityClaude Code
Solid option that is likely worth shortlisting for production workflows.
pythonai-agents
by microsoftQuick view