Run multimodal agents that operate desktop interfaces
$ npx skills add bytedance/UI-TARS-desktopDecision filters
12 skills matching "modal"
Best blend of quality, stars, freshness, and agent usage
Run multimodal agents that operate desktop interfaces
$ npx skills add bytedance/UI-TARS-desktopA set of ready to use Agent Skills for research, science, engineering, analysis, finance and writing.
$ npx skills add K-Dense-AI/scientific-agent-skillsOpen-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.
$ npx skills add deepset-ai/haystackDeeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.
$ npx skills add activeloopai/deeplakeOpen-source framework for building AI-powered apps in JavaScript, Go, and Python, built and used in production by Google
$ npx skills add genkit-ai/genkitBlades is a Go-based multimodal AI Agent framework.
$ npx skills add go-kratos/bladesOpen‑WebUI Tools is a modular toolkit designed to extend and enrich your Open WebUI instance, turning it into a powerful AI workstation. With a suite of over 15 specialized tools, function pipelines, and filters, this project supports academic research, agentic autonomy, multimodal creativity, workflows, and more
$ npx skills add Haervwe/open-webui-toolsAppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
$ npx skills add TencentQQGYLab/AppAgentSecond Brain is an agentic framework that acts as an operating system, using local file intelligence, workflow automation, and LLMs to complete tasks and communicate over multiple modalities and messaging platforms.
$ npx skills add henrydaum/second-brain💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
$ npx skills add showlab/Awesome-GUI-Agent🎉📱 Create dynamic modals, cards, panes for your applications in few steps. Any framework and free.
$ npx skills add tech-systems/panes[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
$ npx skills add mbzuai-oryx/groundingLMM