Process rich media

Multimodal media skills for AI agents

Browse skills for image, video, audio, transcription, metadata extraction, and multimodal content workflows.

Try this task

I need my agent to process images, video, or audio and extract useful information.

Agent should be able to

  • +Read media metadata
  • +Convert formats
  • +Summarize visual or audio content

Workflow map

What to build with these skills

01

Transcribe audio

02

Extract video metadata

03

Summarize images

04

Prepare media for search

Best first installs

Start with high-signal skills

18 matched skills

PaddleOCR

VERIFIED

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

78K stars77 qualityMay 19, 2026 push
$ npx skills add PaddlePaddle/PaddleOCR

Graphify

VERIFIED

AI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.

52K stars76 qualityMay 22, 2026 push
$ npx skills add safishamsi/graphify

Open Design

VERIFIED

Open Design is a powerful, local-first design tool that integrates multiple coding-agent CLIs for generating various design outputs.

50K stars76 qualityMay 23, 2026 push
$ npx skills add nexu-io/open-design

Skill shortlist

More options for this use case

Browse full marketplace

SCrawler

web-automation

🏳️‍🌈 Media downloader from any sites, including Twitter, Reddit, Instagram, BlueSky, TikTok, Threads, Facebook, OnlyFans, YouTube, Pinterest, PornHub, XHamster, XVIDEOS, ThisVid etc.

2.0K stars65 quality

Deeplake

data

Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.

9.1K stars69 quality

Vision Agents

agent-frameworks

Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.

7.8K stars69 quality

Awesome AITools

utility

A comprehensive collection of AI-related utilities with community contributions.

6.0K stars69 quality

ArcReel

agent-frameworks

AI Agent 驱动的开源视频生成工作台 — 小说→角色/场景/道具设计→剧本→分镜图→视频,跨镜头角色与场景一致 | Open-source AI video workspace powered by AI Agents, Nano Banana 2 & Veo 3.1 / Grok / Seedance / OpenAI

2.3K stars65 quality

UI-TARS Desktop

automation

Run multimodal agents that operate desktop interfaces

35K stars75 quality

RPA

web-automation

Ui.Vision Open-Source RPA Software with Computer Vision, OCR, Anthropic Computer Use/LLM. Selenium IDE import/export.

1.9K stars65 quality

Khoj

data

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

35K stars72 quality

Lux

web-automation

👾 Fast and simple video download library and CLI tool written in Go

31K stars72 quality

Haystack

data

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

25K stars74 quality

CV

data

✅(已完结)超级全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】【大飞 大模型Agent】

21K stars73 quality

Ppt Master

agent-frameworks

AI generates natively editable PPTX from any document — real PowerPoint shapes with native animations, not images · by Hugo He

20K stars73 quality

Awesome N8n Templates

agent-frameworks

280+ free n8n automation templates — ready-to-use workflows for Gmail, Telegram, Slack, Discord, WhatsApp, Google Drive, Notion, OpenAI, and more. AI agents, RAG chatbots, email automation, social media, DevOps, and document processing. The largest open-source n8n template collection.

22K stars71 quality

Baoyu Skills

productivity

A repository of skills designed to enhance daily work efficiency with Claude Code.

19K stars73 quality

Kubesphere

utility

KubeSphere is a comprehensive container platform designed for managing Kubernetes across multi-cloud, datacenter, and edge environments.

17K stars73 quality