使用 LangChain 掌握检索增强生成 (RAG) 的终极指南:3、HyDE(假设文档嵌入)在RAG中的应用

HyDE(假设文档嵌入)

HyDE不是基于原始问题生成查询,而是专注于为给定的查询生成假设性文档。生成这种假设性文档的直觉是它们的嵌入向量可以用来在语料库嵌入空间中识别一个邻域,在这个邻域中,基于向量相似性检索类似的真实文档。在这种情况下,RAG将能够基于假设性文档检索更相关的文档,以准确回答用户查询。

让我们尝试使用HyDE通过RAG回答问题!

首先,与之前的笔记本类似,我们首先创建我们的向量存储,并使用OpenAIEmbeddingsChroma初始化检索器。

# 导入必要的库
%load_ext dotenv
%dotenv secrets/secrets.env

from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# 创建DirectoryLoader实例,加载指定目录下的PDF文件
loader = DirectoryLoader('data/', glob="*.pdf", loader_cls=PyPDFLoader)
documents = loader.load()

# 使用RecursiveCharacterTextSplitter将文本分割成块
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
text_chunks = text_splitter.split_documents(documents)

# 使用Chroma和OpenAIEmbeddings创建向量存储
vectorstore = Chroma.from_documents(
    documents=text_chunks,
    embedding=OpenAIEmbeddings(),
    persist_directory="data/vectorstore"
)
vectorstore.persist()

# 将向量存储转换为检索器
retriever = vectorstore.as_retriever(search_kwargs={'k': 5})

然后我们让LLM通过链写一个关于所问问题的“假设性”段落。

from langchain.prompts import ChatPromptTemplate

# 创建一个用于生成假设性文档的提示模板
hyde_prompt = ChatPromptTemplate.from_template(
    """
    Please write a scientific passage of a paper to answer the following question:\n
    Question: {question}\n
    Passage:
    """
)

# 创建一个生成假设性文档的链
generate_doc_chain = (
    {'question': RunnablePassthrough()}
    | hyde_prompt
    | ChatOpenAI(model='gpt-4', temperature=0)
    | StrOutputParser()
)

# 使用链生成假设性文档
question = "How Low Rank Adapters work in LLMs?"
generate_doc_chain.invoke(question)
"Low Rank Adapters (LRAs) are a recent development in the field of Large Language Models (LLMs) that aim to reduce the computational and memory requirements of these models while maintaining their performance. The fundamental principle behind LRAs is the use of low-rank approximations to reduce the dimensionality of the model's parameters.\n\nIn the context of LLMs, an adapter is a small neural network that is inserted between the layers of a pre-trained model. The purpose of this adapter is to adapt the pre-trained model to a new task without modifying the original parameters of the model. This allows for efficient transfer learning, as the computational cost of training the adapter is significantly less than retraining the entire model.\n\nLow Rank Adapters take this concept a step further by applying a low-rank approximation to the adapter's parameters. This is achieved by decomposing the weight matrix of the adapter into two smaller matrices, effectively reducing the number of parameters that need to be stored and computed. This decomposition is typically achieved using methods such as singular value decomposition (SVD) or principal component analysis (PCA).\n\nThe use of low-rank approximations in LRAs allows for a significant reduction in the computational and memory requirements of LLMs. Despite this reduction, LRAs are able to maintain a high level of performance, as the low-rank approximation captures the most important features of the data. This makes LRAs an effective tool for adapting pre-trained LLMs to new tasks in a computationally efficient manner."

使用生成的段落,我们使用我们的检索器检索相似的文档。

# 创建一个检索链,将生成文档的链和检索器连接起来
retrieval_chain = generate_doc_chain | retriever
# 检索文档
retireved_docs = retrieval_chain.invoke({"question": question})
retireved_docs
[Document(page_content='over-parametrized models in fact reside on a low intrinsic dimension. We hypothesize that the\nchange in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed\nLow-RankAdaptation (LoRA) approach. LoRA allows us to train some dense layers in a neural\nnetwork indirectly by optimizing rank decomposition matrices of the dense layers’ change during\nadaptation instead, while keeping the pre-trained weights frozen, as shown in Figure 1. Using GPT-3', metadata={'page': 1, 'source': 'data/LoRA.pdf'}),
 Document(page_content='over-parametrized models in fact reside on a low intrinsic dimension. We hypothesize that the\nchange in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed\nLow-RankAdaptation (LoRA) approach. LoRA allows us to train some dense layers in a neural\nnetwork indirectly by optimizing rank decomposition matrices of the dense layers’ change during\nadaptation instead, while keeping the pre-trained weights frozen, as shown in Figure 1. Using GPT-3', metadata={'page': 1, 'source': 'data/LoRA.pdf'}),
 Document(page_content='over-parametrized models in fact reside on a low intrinsic dimension. We hypothesize that the\nchange in weights during model adaptation also has a low “intrinsic rank”, leading to our proposed\nLow-RankAdaptation (LoRA) approach. LoRA allows us to train some dense layers in a neural\nnetwork indirectly by optimizing rank decomposition matrices of the dense layers’ change during\nadaptation instead, while keeping the pre-trained weights frozen, as shown in Figure 1. Using GPT-3', metadata={'page': 1, 'source': 'data/LoRA.pdf'}),
 Document(page_content='requirements by using a small set of trainable parameters, often termed adapters, while not updating\nthe full model parameters which remain fixed. Gradients during stochastic gradient descent are\npassed through the fixed pretrained model weights to the adapter, which is updated to optimize the\nloss function. LoRA augments a linear projection through an additional factorized projection. Given\na projection XW =YwithX∈Rb×h,W∈Rh×oLoRA computes:\nY=XW +sXL 1L2, (3)', metadata={'page': 2, 'source': 'data/QLoRA.pdf'}),
 Document(page_content='requirements by using a small set of trainable parameters, often termed adapters, while not updating\nthe full model parameters which remain fixed. Gradients during stochastic gradient descent are\npassed through the fixed pretrained model weights to the adapter, which is updated to optimize the\nloss function. LoRA augments a linear projection through an additional factorized projection. Given\na projection XW =YwithX∈Rb×h,W∈Rh×oLoRA computes:\nY=XW +sXL 1L2, (3)', metadata={'page': 2, 'source': 'data/QLoRA.pdf'})]

最后,基于“假设性”段落检索的文档被用作上下文,通过final_rag_chain回答我们原始的问题。

# 创建一个用于回答问题的提示模板
template = """
Answer the following question based on the provided context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

# 创建最终的RAG链
final_rag_chain = (
    prompt
    | ChatOpenAI(model='gpt-4', temperature=0)
    | StrOutputParser()
)

# 使用最终的RAG链回答问题
final_rag_chain.invoke({"context": retireved_docs, "question": question})
"Low-Rank Adapters (LoRA) work in large language models (LLMs) by allowing the training of some dense layers in a neural network indirectly. This is done by optimizing rank decomposition matrices of the dense layers' change during adaptation, while keeping the pre-trained weights frozen. LoRA also augments a linear projection through an additional factorized projection. During stochastic gradient descent, gradients are passed through the fixed pre-trained model weights to the adapter, which is updated to optimize the loss function."

尽管这种技术可能有助于回答问题,但由于基于错误/虚构的假设性段落检索文档,答案有可能是错误的。

扩展知识点:

  • HyDE:Hypothetical Document Embeddings,一种通过生成假设性文档来帮助检索更相关文档的技术。
  • RAG:Retrieval-Augmented Generation,一种结合检索和生成的模型,用于提高问答系统的效果。
  • LLM:Large Language Models,大型语言模型,如GPT系列。
  • LoRA:Low-Rank Adaptation,一种通过优化低秩分解矩阵来训练神经网络密集层的方法。
<think>嗯,用户现在提出了一个关于RAG检索增强生成)的需求,看起来用户对检索增强生成的关键词和实现方法感兴趣。首先,我需要回顾一下之前的对话历史。之前用户似乎提到了系统级指令,要求使用特定的LaTeX格式和中文回答,还要生成相关问题。不过当前的查询集中在RAG技术上。用户明确提到要参考站内引用,但系统说明这些引用仅供参考,真正需要关注的是用户当前的问题。用户现在的问题是查找与RAG相关的关键词或实现方法。因此,重点应该放在RAG的核心概念、技术术语以及常用的实现方式上。首先,我需要确定用户可能的背景。用户可能是开发者、研究人员或者刚接触RAG的技术爱好者。由于用户询问关键词和实现方法,可能是在做项目或研究前的资料收集。用户可能希望了解如何搭建一个RAG系统,或者需要关键术语来进一步搜索资料。接下来,我需要梳理RAG的关键组成部分。检索增强生成通常分为检索生成两个阶段。在检索阶段,重要的技术包括嵌入模型(如BERT、SentenceTransformers)、向量数据库(如Faiss、Milvus、Chroma)、以及检索策略(如密集检索、混合检索)。生成阶段则涉及语言模型(如GPT、Llama)和集成方式。此外,预处理和优化方法也很关键,比如分块策略和查询重写。用户可能还需要具体的实现方法。比如,使用LangChain这样的框架可以简化流程,或者直接通过API调用嵌入模型和向量数据库。是否要提到开源工具和云服务?比如HuggingFace的Transformers库,或者云服务如AzureAISearch,这些对实现有帮助。另外,评估指标如检索召回率、生成相关性也是用户可能关心的,但用户的问题更偏向于实现而非评估,可能需要优先前者。需要判断用户是否需要区分基础实现和高级优化。比如,基础的流程是文本嵌入->向量检索->结果传递给LLM生成。而高级方法可能包括迭代检索、上下文压缩或检索微调。但在回答中应该覆盖这两层,同时将基础部分放在前面,高级部分作为扩展。关键词方面,用户可能希望得到术语列表。比如嵌入模型、向量数据库、大语言模型、分块策略、RAG框架等。还要注意术语的英文对应词,如“hybridretrieval”对应混合检索,方便用户进一步搜索英文资料。最后,生成的相关问题需要考虑用户可能的进一步需求。比如应用场景(客服、文档摘要)、不同方法的优缺点(成本、准确性)、评估方法或框架选择等。这样用户可以根据回答深入探究特定方向。需要确保回答符合之前的系统指令:行内公式用$...$(虽然RAG可能较少需要公式,但如有相关指标如召回率$R@k$可以应用),独立公式用$$...$$。另外,引用文献时添加脚注,如引用原始RAG论文[^1]。同时使用中文,结构清晰分点说明。</think> 以下是与 **RAG检索增强生成)** 相关的核心关键词和实现方法详解,结合最新技术实践整理: --- ### **一、核心关键词** 1. **检索阶段** - **嵌入模型**:Sentence Transformers、BERT、BGE - **向量数据库**:Faiss、Milvus、Chroma、Pinecone - **检索策略**:密集检索(Dense Retrieval)、稀疏检索(BM25)、混合检索(Hybrid) - **查询重写**:Query Expansion、HyDE假设文档嵌入)[^1] 2. **生成阶段** - **大语言模型(LLM)**:GPT-4、Llama 3、Claude - **上下文注入**:Prompt Engineering、ReRanking(如Cohere Rerank) - **知识增强**:External Knowledge Integration、Self-RAG[^2] 3. **系统优化** - **分块策略**:滑动窗口(Sliding Window)、语义分块(Semantic Chunking) - **评估指标**:检索召回率(Recall@k)、答案相关性(BERTScore) - **迭代检索**:Reciprocal Rank Fusion(RRF)、多步检索(Multi-step Retrieval) --- ### **二、主流实现方法** #### **基础流程** ```mermaid graph LR A[用户提问] --> B(查询嵌入) B --> C{向量数据库检索} C --> D[Top-k相关文档] D --> E[文档注入LLM上下文] E --> F[LLM生成答案] ``` #### **关键技术实现** 1. **LangChain框架(Python)** ```python from langchain_community.vectorstores import FAISS from langchain_community.embeddings import HuggingFaceEmbeddings # 1. 加载嵌入模型 embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-zh") # 2. 构建向量数据库 docs = ["文本1", "文本2", ...] # 知识库文档 vectorstore = FAISS.from_texts(docs, embeddings) # 3. RAG流程 retriever = vectorstore.as_retriever() results = retriever.get_relevant_documents("用户问题") ``` 2. **LlamaIndex优化方案** - 支持 **分层索引**(知识图谱+向量索引) - 自动查询路由(Query Routing)到最佳数据源 - 细粒度访问控制(按权限过滤文档3. **高级技术** - **FLARE架构**:主动检索机制(预测知识缺口触发检索) - **RA-DIT框架**:联合训练检索器与生成器[^3] - **DSPy编程模型**:声明式优化RAG流水线 --- ### **三、开源工具推荐** | 工具名称 | 核心功能 | 适用场景 | |----------------|------------------------------|-------------------| | **LlamaIndex** | 结构化/非结构化数据索引 | 企业知识库 | | **Haystack** | 端到端流水线构建 | 研究原型开发 | | **Quivr** | 本地化私有部署 | 敏感数据处理 | | **Unstructured**| PDF/PPT复杂文档解析 | 多格式文件处理 | --- ### **四、关键挑战与解决方向** 1. **文档边界问题** → 解决方案:动态上下文整合(Dynamic Context Integration) 2. **噪声干扰** → 解决方案:ReRank模块(如BAAI的bge-reranker) 3. **知识实时性** → 解决方案:增量索引更新(Delta Indexing) 4. **多模态检索** → 解决方案:CLIP跨模态嵌入[^4] ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Hugo_Hoo

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值