Ingest, retrieve, and cite

RAG knowledge-base stack

A stack for document-heavy agents that ingest files, create searchable knowledge, retrieve relevant context, and answer with grounded sources.

Built for Teams building support, research, internal documentation, or compliance assistants.

Outcomes

  • Ingest documents
  • Chunk and index content
  • Retrieve context
  • Cite sources in answers

Workflow map

How the stack fits together

01

Ingest

Collect documents, pages, or notes and preserve source metadata.

02

Index

Chunk content and store embeddings in a retrievable format.

03

Retrieve

Fetch only the relevant context for each user question.

04

Answer

Generate grounded responses with citations and confidence checks.

Recommended stack

Start with these skills

Ranked by workflow relevance, quality score, GitHub adoption, and maintenance freshness.

#1MilvusExcellent · 100

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

44K starsApache-2.0data
Compare
$ npx skills add milvus-io/milvus
#2WeKnoraExcellent · 100

Open-source LLM knowledge platform: turn raw documents into a queryable RAG, an autonomous reasoning agent, and a self-maintaining Wiki.

15K starsUnknowndata
Compare
$ npx skills add Tencent/WeKnora
#3TxtaiExcellent · 100

💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows

13K starsApache-2.0agent-frameworks
Compare
$ npx skills add neuml/txtai
#4PaddleOCRExcellent · 100

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

78K starsApache-2.0data
Compare
$ npx skills add PaddlePaddle/PaddleOCR
#5PageIndexExcellent · 100

📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG

32K starsMITagent-frameworks
Compare
$ npx skills add VectifyAI/PageIndex
#6Langchain ChatchatExcellent · 100

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain

38K starsApache-2.0data
Compare
$ npx skills add chatchat-space/Langchain-Chatchat
#7Opendataloader PdfExcellent · 100

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

22K starsApache-2.0data
Compare
$ npx skills add opendataloader-project/opendataloader-pdf
#8DocsGPTExcellent · 100

Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.

18K starsMITdata
Compare
$ npx skills add arc53/DocsGPT

Ideal for

  • - Internal docs assistants
  • - Research archives
  • - Support knowledge bases
  • - Policy lookup

Avoid when

  • - The corpus changes every few seconds
  • - You cannot expose source documents to the agent runtime