Process rich media

Multimodal media skills for AI agents

Browse skills for image, video, audio, transcription, metadata extraction, and multimodal content workflows.

Browse matching skills Read guide Resolve via Agent API

Try this task

I need my agent to process images, video, or audio and extract useful information.

Matched

Strong trust

Install ready

Auto allowed

500+ stars

Agent should be able to

+Read media metadata
+Convert formats
+Summarize visual or audio content

Resolve

Let the agent pick

Returns the best skill, alternatives, install handoff, risk summary, and safety gate.

Text plan

LLM-readable output

Plain text version for Codex, Claude Code, Cursor, and custom agent runtimes.

Browse

Human shortlist

Open the filtered registry view for this workflow and compare candidates manually.

Workflow map

What to build with these skills

Transcribe audio

Extract video metadata

Summarize images

Prepare media for search

Best first installs

Start with high-signal skills

18 matched skills

Transformers

VERIFIED

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

162K stars94 trust95 auditVERIFIED

Install

Agent install candidate

Risk

Safe to try

Agent fit

Claude Code + CLI

Updated

Jun 16, 2026

$ npx skills add huggingface/transformers

Daft

VERIFIED

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

5.6K stars92 trust95 auditVERIFIED

Install

Agent install candidate

Risk

Safe to try

Agent fit

Claude Code + CLI

Updated

Jun 16, 2026

$ npx skills add Eventual-Inc/Daft

Diffusers

VERIFIED

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

34K stars94 trust95 auditVERIFIED

Install

Agent install candidate

Risk

Safe to try

Agent fit

Claude Code + CLI

Updated

Jun 16, 2026

$ npx skills add huggingface/diffusers

Skill shortlist

More options for this use case

Browse full marketplace

Vllm Omni

media-automationDesign

A framework for efficient model inference with omni-modality models

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

5.2K stars91 trust94 audit86 safety

Cs Video Courses

ml-automationDesign

List of Computer Science courses with video lectures.

Review before production · Review the audit page, then allow agent install in a sandboxed workflow.

82K stars88 trust92 audit76 safety

Ultralytics

ml-automationDesign

Ultralytics YOLO 🚀

Low metadata risk · Require human approval before installing into a real workspace.

58K stars88 trust92 audit64 safety

SimpleTuner

ml-automationDesign

A general fine-tuning kit geared toward image/video/audio diffusion models.

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

2.9K stars89 trust94 audit86 safety

Supervision

ml-automationDesign

We write your reusable computer vision tools. 💜

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

44K stars92 trust95 audit87 safety

Mediapipe

ml-automationDesign

Cross-platform, customizable ML solutions for live and streaming media.

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

36K stars92 trust95 audit87 safety

Ocrs

ml-automationDesign

Rust library and CLI tool for OCR (extracting text from images)

Low metadata risk · Require human approval before installing into a real workspace.

1.8K stars85 trust92 audit64 safety

Pixelle Video

media-automationDesign

🚀 AI 全自动短视频引擎 | AI Fully Automated Short Video Engine

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

23K stars92 trust95 audit87 safety

Lance

media-automationDesign

A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

1.2K stars91 trust94 audit86 safety

Screenpipe

ml-automationDesign

YC (S26) | AI that knows what you've seen, said, or heard. Records everything you do, say, hear 24/7, local, private, secure

Review before production · Review the audit page, then allow agent install in a sandboxed workflow.

19K stars90 trust93 audit77 safety

LivePortrait

media-automationDesign

Bring portraits to life!

Review before production · Review the audit page, then allow agent install in a sandboxed workflow.

19K stars87 trust90 audit74 safety

Waifu2x Extension GUI

media-automationDesign

Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, Real-ESRGAN, Real-CUGAN, RTX Video Super Resolution VSR, SRMD, RealSR, Anime4K, RIFE, IFRNet, CAIN, DAIN, and ACNet.

Review before production · Review the audit page, then allow agent install in a sandboxed workflow.

17K stars90 trust93 audit77 safety

Video Use

mediaCoding

A tool that lets AI agents like Claude Code edit videos by cutting filler words, color grading, adding subtitles, and more, all via natural language commands.

Review before production · Test manually in an isolated workspace and compare against safer alternatives.

16K stars86 trust91 audit47 safety

Pytorch CycleGAN And Pix2pix

media-automationDesign

Image-to-Image Translation in PyTorch

Review before production · Require human approval before installing into a real workspace.

25K stars83 trust83 audit67 safety

Pytorch Grad Cam

ml-automationDesign

Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.

Low metadata risk · Allow agent install in a sandbox or low-risk workspace, then promote after one successful narrow task.

13K stars93 trust95 audit87 safety

FAQ

How to choose skills for this workflow

These answers are written for both human builders and agents consuming the Registry API.

What are the best AI agent skills for multimodal media?

Start by comparing Transformers, Daft, Diffusers. OpenAgentSkill ranks them by workflow fit, GitHub adoption, trust score, safety gate, and install readiness.

Can an AI agent use this page directly?

Yes. Use the linked Registry API prompt to query /api/skills/search with the task: "I need my agent to process images, video, or audio and extract useful information." and retrieve install handoff links for the top results.

Should I install every recommended skill?

No. Start with the highest-fit skill, test it in a sandbox workflow, and add companion skills only when the task needs extra coverage.