Skip to content

RAG 架构全景

什么是 RAG

RAG(Retrieval-Augmented Generation,检索增强生成)通过外部知识库检索相关文档,注入到 LLM 上下文中,解决 LLM 知识过时、幻觉、无法访问私有数据的问题。

用户问题


┌─────────────────────────────────┐
│  1. Query Embedding             │  将问题转为向量
└─────────────────────────────────┘


┌─────────────────────────────────┐
│  2. 向量检索(Top-K)            │  从向量库检索相似文档
│  Chroma / Faiss / Milvus       │
└─────────────────────────────────┘


┌─────────────────────────────────┐
│  3. Rerank(可选)               │  重排序提升精度
└─────────────────────────────────┘


┌─────────────────────────────────┐
│  4. Prompt 构建                  │  问题 + 检索文档 → LLM
│  "基于以下文档回答:..."        │
└─────────────────────────────────┘


┌─────────────────────────────────┐
│  5. LLM 生成答案                 │
└─────────────────────────────────┘

快速开始

python
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA

# 1. 加载文档
loader = PyPDFLoader("招商银行2024年报.pdf")
documents = loader.load()

# 2. 文档分块
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", "。", "!", "?", ";", ",", " ", ""]
)
chunks = text_splitter.split_documents(documents)

# 3. Embedding 模型
embeddings = HuggingFaceEmbeddings(
    model_name="BAAI/bge-small-zh-v1.5",  # 中文 Embedding
    model_kwargs={"device": "cpu"}
)

# 4. 构建向量库
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# 5. 构建 RAG 链
llm = ChatOpenAI(model="qwen-plus", ...)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3})
)

# 6. 查询
result = qa_chain.invoke({"query": "招商银行2024年净利润是多少?"})
print(result["result"])

文档分块策略

python
from langchain.text_splitter import (
    RecursiveCharacterTextSplitter,
    CharacterTextSplitter,
    MarkdownHeaderTextSplitter
)

# 通用分块(推荐)
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,        # 每块 500 字符
    chunk_overlap=50,      # 重叠 50 字符,保持上下文连贯
    length_function=len,
    separators=["\n\n", "\n", "。", "!", "?", ";", ",", " ", ""]
)

# Markdown 结构化分块
md_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=[
        ("#", "Header 1"),
        ("##", "Header 2"),
        ("###", "Header 3"),
    ]
)

# 金融报告分块(保留章节结构)
def split_financial_report(text: str):
    """按章节分块,保留标题信息"""
    sections = []
    current_section = {"title": "", "content": ""}
    
    for line in text.split("\n"):
        if line.startswith("第") and "章" in line:
            if current_section["content"]:
                sections.append(current_section)
            current_section = {"title": line.strip(), "content": ""}
        else:
            current_section["content"] += line + "\n"
    
    if current_section["content"]:
        sections.append(current_section)
    
    return sections

Embedding 模型选择

模型维度中文速度适用场景
bge-small-zh-v1.5512⭐⭐⭐⭐⭐通用中文
bge-large-zh-v1.51024⭐⭐⭐⭐⭐高精度场景
text-embedding-ada-0021536⭐⭐⭐⭐OpenAI API
m3e-base768⭐⭐⭐⭐⭐中文通用
python
# 本地 Embedding
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="BAAI/bge-large-zh-v1.5",
    model_kwargs={"device": "cuda"},  # GPU 加速
    encode_kwargs={"normalize_embeddings": True}
)

# OpenAI Embedding
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-ada-002",
    openai_api_key="sk-xxx"
)

检索策略

python
# 相似度检索(默认)
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)

# MMR(最大边际相关性,增加多样性)
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 20, "lambda_mult": 0.5}
)

# 相似度阈值过滤
retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.7, "k": 5}
)

高级 RAG 技术

python
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever

# 向量检索
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# BM25 关键词检索
bm25_retriever = BM25Retriever.from_documents(chunks)
bm25_retriever.k = 5

# 混合检索(0.5 权重各占一半)
ensemble_retriever = EnsembleRetriever(
    retrievers=[vector_retriever, bm25_retriever],
    weights=[0.5, 0.5]
)

2. 多查询检索

python
from langchain.retrievers.multi_query import MultiQueryRetriever

retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(),
    llm=llm
)
# 自动生成多个相关查询,扩大召回

3. 父文档检索

python
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore

# 小块检索,大块返回(保留更多上下文)
store = InMemoryStore()
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=RecursiveCharacterTextSplitter(chunk_size=200),
    parent_splitter=RecursiveCharacterTextSplitter(chunk_size=1000)
)

RAG 优化建议

  1. 分块大小:金融文档推荐 500-800 字符,保留完整语义
  2. Overlap:10-20% 重叠,避免关键信息被截断
  3. Top-K:初始检索 10-20 个,Rerank 后取 Top-3
  4. Embedding 模型:中文场景优先 bge-large-zh-v1.5
  5. 混合检索:向量 + BM25 提升召回率

本站内容由 褚成志 整理编写,仅供学习参考