PaddleOCR
VERIFIEDTurn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
$ npx skills add PaddlePaddle/PaddleOCRProcess rich media
Browse skills for image, video, audio, transcription, metadata extraction, and multimodal content workflows.
Try this task
I need my agent to process images, video, or audio and extract useful information.
Agent should be able to
Workflow map
Transcribe audio
Extract video metadata
Summarize images
Prepare media for search
Best first installs
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
$ npx skills add PaddlePaddle/PaddleOCRAI coding assistant skill (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and more). Turn any folder of code, SQL schemas, R scripts, shell scripts, docs, papers, images, or videos into a queryable knowledge graph. App code + database schema + infrastructure in one graph.
$ npx skills add safishamsi/graphifyOpen Design is a powerful, local-first design tool that integrates multiple coding-agent CLIs for generating various design outputs.
$ npx skills add nexu-io/open-designSkill shortlist
🏳️🌈 Media downloader from any sites, including Twitter, Reddit, Instagram, BlueSky, TikTok, Threads, Facebook, OnlyFans, YouTube, Pinterest, PornHub, XHamster, XVIDEOS, ThisVid etc.
Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.
Open Vision Agents by Stream. Build voice and vision agents quickly with any model or video provider. Uses Stream's edge network for ultra-low latency.
A comprehensive collection of AI-related utilities with community contributions.
AI Agent 驱动的开源视频生成工作台 — 小说→角色/场景/道具设计→剧本→分镜图→视频,跨镜头角色与场景一致 | Open-source AI video workspace powered by AI Agents, Nano Banana 2 & Veo 3.1 / Grok / Seedance / OpenAI
Run multimodal agents that operate desktop interfaces
Ui.Vision Open-Source RPA Software with Computer Vision, OCR, Anthropic Computer Use/LLM. Selenium IDE import/export.
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
👾 Fast and simple video download library and CLI tool written in Go
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.
✅(已完结)超级全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】【大飞 大模型Agent】
AI generates natively editable PPTX from any document — real PowerPoint shapes with native animations, not images · by Hugo He
280+ free n8n automation templates — ready-to-use workflows for Gmail, Telegram, Slack, Discord, WhatsApp, Google Drive, Notion, OpenAI, and more. AI agents, RAG chatbots, email automation, social media, DevOps, and document processing. The largest open-source n8n template collection.
A repository of skills designed to enhance daily work efficiency with Claude Code.
KubeSphere is a comprehensive container platform designed for managing Kubernetes across multi-cloud, datacenter, and edge environments.