Query Transformations提升上下文召唤质量
前言
本文引用上一篇博客的作法。在本地开启一个代理服务器,然后使用OpenAI的ChatOpenAI作为聊天接口客户端,使其发送的请求链接到我们的本地服务器。
摘要
在传统的 RAG 检索中,单一查询往往存在语义表达不充分、表述角度狭窄以及信息覆盖不全面等问题。这种局限容易导致向量检索阶段召回的上下文不准确,从而影响最终生成结果的质量。为了解决这些问题,Query Transformations
技术应运而生。它通过从多个语义视角对原始查询进行扩展和重构,有效提升了查询的表达能力和语义覆盖范围,弥补了信息缺失,为模型提供更丰富、更相关的上下文,引导生成结果更贴近用户意图。
Multi Query
Multi Query
的主旨思想是由大语言模型生成多个与原始查询相关但视角不同的子问题。然后对每一个子问题进行向量检索,在向量数据库根据相似度匹配相关的Documents
。最后对所有子问题的Documents
进行去重和汇总,作为Context
传入文本生成模型,得到最终回答。
Fig .1 Multi Query框架示意图
实现细节
在Multi Query
将完整介绍整个实现流程,在后续的其他Query Transformations
策略中只介绍上下文召唤部分。
Index
导入文本数据后,使用Loader
加载。并通过RecursiveCharacterTextSplitter
对文本进行切块。
file_path = os.path.abspath('../docs/PatchTST.pdf')
loader = PyPDFLoader(file_path=file_path, extract_images=True)
pages = loader.load_and_split()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=50,
chunk_overlap=10
)
docs = text_splitter.split_documents(pages)
定义Embedding
和Text Generation
模型。如上所述,使用vllm
部署OpenAI服务,然后调用其聊天接口客户端ChatOpenAI
。
# 英文通用文本模型
embedding = ModelScopeEmbeddings(model_id='iic/nlp_corom_sentence-embedding_english-base')
vectorstore = Chroma.from_documents(documents=docs, embedding=embedding, collection_name='ModelScope')
retriever = vectorstore.as_retriever()
# 使用vllm部署OpenAI Serve,然后使用ChatOpenAI
os.environ['VLLM_USE_MODELSCOPE'] = 'True'
chat = ChatOpenAI(
model='Qwen/Qwen3-0.6B',
openai_api_key="EMPTY",
openai_api_base='http://localhost:8000/v1',
stop=['<|im_end|>'],
temperature=0
)
Prompt
此处是Multi Query
的关键思想之一,通过prompt
要求大语言模型生成多个不同视角的子问题,以丰富上下文。
# Multi Query: 根据原始查询生成多个不同视角的相关问题
template = """You are an AI language model assistant. Your task is to generate three
different versions of the given user question to retrieve relevant documents from a vector
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search.
Provide these alternative questions separated by newlines. Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(template)
generate_queries = (
prompt_perspectives
| chat
| StrOutputParser()
| (lambda x: x.split("\n"))
)
去重和汇总
通过generate_queries
生成多个子问题之后,依次到向量数据库检索子问题相关的Documents
。然后对Documents
进行去重,确保唯一性。最后,作为Context
和原始查询Question
传递给文本生成模型,生成最终答案。
def get_unique_union(documents: list[list]):
# 序列化,将Document转为Str
flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
# 去重
unique_docs = list(set(flattened_docs))
# 反序列化-将Str转为Document
return [loads(doc) for doc in unique_docs]
template = """Answer the following question based on this context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
retrival_chain = generate_queries | retriever.map() | get_unique_union
chat_chain = {
'context': retrival_chain,
'question': itemgetter('question')
} | prompt | chat | StrOutputParser()
question = "What is the purpose of time series forecasting?"
ans = chat_chain.invoke({'question': question})
print(ans)
RAG Fusion
RAG Fusion
也是将原始查询通过LLM
生成多个相关但视角不同的子问题,然后在向量数据库中检索匹配的Documents
。与Multi Query
不同的是,RAG Fusion
通过一种RRF
算法,计算每个文档的贡献值,然后根据贡献值降序排序。最终作为Context
传入文本生成模型。
Fig .2 RAG Fusion框架示意图
Prompt
使用Prompt
让LLM
生成多个相关的子问题
# Fusion: 根据提出的问题生成多个不同视角的相关问题
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (3 queries):"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)
# 假设生成三个相关问题。
# 对于每一个问题,通过StrOutputParser转成字符串之后,由(lambda x: x.split('\n'))划分为一个数组
# 最终,三个问题被处理完之后,得到一个list[list]的数据类型
generate_queries = (
prompt_rag_fusion
| chat
| StrOutputParser()
| (lambda x: x.split("\n"))
)
RRF
此算法的核心是根据贡献值的大小降序排序。贡献值是由Document
在不同向量检索结果中出现的次数加权得到,即使该Document
在向量检索结果相似度较低,依旧会得到更高的贡献值。因为这样更能体现一种共识
。
注:原始查询检索向量数据库,会根据相似度,选出top-k
个Documents
。
def reciprocal_rank_fusion(results: list[list], k=60):
""" Reciprocal_rank_fusion that takes multiple lists of ranked documents
and an optional parameter k used in the RRF formula """
fused_scores = {}
# 一个文档在多个向量检索结果中出现的次数越多,则贡献值加权则越多,排名越高。
for docs in results:
for rank, doc in enumerate(docs):
# 序列化,将Document转为Str
doc_str = dumps(doc)
# 初始化贡献值
if doc_str not in fused_scores:
fused_scores[doc_str] = 0
# 更新每个文档的贡献度,RRF: 1 / (rank + k)
fused_scores[doc_str] += 1 / (rank + k)
# 反序列化,将Str转为Document。并从大到小降序排序,即贡献度高的文档在前面
reranked_results = [
(loads(doc), score)
for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
]
return reranked_results
Text Generation
retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
question = "What is the purpose of time series forecasting?"
template = """Answer the following question based on this context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
chat_chain = {
'context': retrieval_chain_rag_fusion,
'question': itemgetter('question')
} | prompt | chat | StrOutputParser()
ans = chat_chain.invoke({'question': question})
print(ans)
Decomposition
Decomposition
同样是由原始查询生成多个子问题。每一个子问题使用历史的Qustion
和Answer
作为额外的上下文传递给LLM
,使得每一次回答都是基于历史的问题和回答,具有更丰富的Context
。每一次回答都是上一次的精炼和提升,最终得出一个与原始查询高度匹配的答案。
Fig .3 Decomposition框架示意图
Prompt & sub-questions
与上述内容一致,由原始查询生成多个相关的子问题。
# Decomposition: 根据提出的问题生成多个不同视角的相关问题
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Output (3 queries):"""
prompt_rag_decomposition = ChatPromptTemplate.from_template(template)
generate_queries = (
prompt_rag_decomposition
| chat
| StrOutputParser()
| (lambda x: x.split("\n"))
)
question = "What is the purpose of time series forecasting?"
questions = generate_queries.invoke({'question': question})
print(questions)
Answer recursively Template
相较于之前的Prompt
,Decomposition
的多了一个q_a_pairs
变量,用于存储历史的对话内容,以丰富上下文。
template = """Here is the question you need to answer:
\n --- \n {question} \n --- \n
Here is any available background question + answer pairs:
\n --- \n {q_a_pairs} \n --- \n
Here is additional context relevant to the question:
\n --- \n {context} \n --- \n
Use the above context and any background question + answer pairs to answer the question: \n {question}
"""
prompt = ChatPromptTemplate.from_template(template)
Text Generation
每一次回答都会将Question
和Answer
追加到q_a_pairs
变量中,然后作为额外的上下文传入LLM
中。
def format_qa_pair(question, answer):
"""Format Q and A pair"""
formatted_string = ""
formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
# 删除多余空格
return formatted_string.strip()
q_a_pairs = ""
for q in questions:
rag_chain = (
{"context": itemgetter("question") | retriever,
"question": itemgetter("question"),
"q_a_pairs": itemgetter("q_a_pairs")}
| prompt
| chat
| StrOutputParser())
answer = rag_chain.invoke({"question": q, "q_a_pairs": q_a_pairs})
print(answer)
q_a_pair = format_qa_pair(q, answer)
q_a_pairs = q_a_pairs + "\n---\n" + q_a_pair
Answer Individually
Answer Individually
由原始查询生成多个子问题。对于每一个子问题,通过检索向量数据库,得到多个相似文档,并以此作为Context
,利用LLM
回答此问题。最终得到多个子问题的答案,并以Q-A
的格式将其合并为一个字符串,作为Context
传入LLM
,得到最终回答。
Fig .4 Answer Individually框架示意图
sub-question
对于原始查询,生成不同视角的多个子问题。然后依次对每个子问题进行向量检索,并作为context
。最后利用大语言模型对其进行文本生成,并记录问题和结果。最后得到子问题和子问题对应的回答两个列表。
def retrieve_and_rag(question, prompt_rag, sub_question_generator_chain):
"""RAG on each sub-question"""
# Use our decomposition /
sub_questions = sub_question_generator_chain.invoke({"question": question})
# Initialize a list to hold RAG chain results
rag_results = []
for sub_question in sub_questions:
# Retrieve documents for each sub-question
retrieved_docs = retriever.get_relevant_documents(sub_question)
# Use retrieved documents and sub-question in RAG chain
answer = (prompt_rag | chat | StrOutputParser()).invoke({
"context": retrieved_docs,
"question": sub_question
})
rag_results.append(answer)
return rag_results, sub_questions
question = "What is the purpose of time series forecasting?"
template = """
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
"""
prompt_rag = ChatPromptTemplate.from_template(template)
# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag, generate_queries)
Merge
合并Qustions
和Answers
,使其成为一个具有丰富语义信息的Context
。
def format_qa_pairs(questions, answers):
"""Format Q and A pairs"""
formatted_string = ""
for i, (question, answer) in enumerate(zip(questions, answers), start=1):
formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
# 用于去除字符串开头和结尾的空白字符(包括空格、换行符 \n、制表符 \t 等)
return formatted_string.strip()
context = format_qa_pairs(questions, answers)
Text Generation
# Prompt
template = """Here is a set of Q+A pairs:
{context}
Use these to synthesize an answer to the question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
rag_chain = (
prompt
| chat
| StrOutputParser()
)
answer = rag_chain.invoke({"context": context,"question": question})
print(answer)
Step Back
原始查询往往过于具体,可能导致向量检索仅匹配字面相似内容而忽略更有价值的概念层级信息。Step-Back 通过将 query
提升一个层级来弥合用户意图与文档语义之间的差距。旨在引出更“宽泛”的信息,从而能覆盖更多潜在相关的上下文。
例如:
原始查询:What is the purpose of time series forecasting?
Step Back:Whta is time series forecasting?
Fig .5 Step Back
Few Shot Examples
其中examples
给出了几个示例,其目的是通过少量示例(few-shot examples)教会大模型如何将一个具体的问题转化为一个更抽象、背景更宽泛的step-back
问题。
examples = [
{
"input": "How does PatchTST improve time series forecasting?",
"output": "What is PatchTST?",
},
{
"input": "What is the purpose of time series forecasting?",
"output": "What is time series forecasting?",
},
]
# We now transform these to example messages
example_prompt = ChatPromptTemplate.from_messages(
[
("human", "{input}"),
("ai", "{output}"),
]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
example_prompt=example_prompt,
examples=examples,
)
prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"""You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:""",
),
# Few shot examples
few_shot_prompt,
# New question
("user", "{question}"),
]
)
generate_queries_step_back = prompt | chat | StrOutputParser()
Text Generation
对于生成的Step-Back Query
,通过向量检索得到相关的Documents
,并以此作为Context
传入LLM
,得到最终回答。
response_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.
# {normal_context}
# {step_back_context}
# Original Question: {question}
# Answer:"""
response_prompt = ChatPromptTemplate.from_template(response_prompt_template)
chain = (
{
# Retrieve context using the normal question
"normal_context": itemgetter('question') | retriever,
# Retrieve context using the step-back question
"step_back_context": generate_queries_step_back | retriever,
# Pass on the question
"question": itemgetter('question'),
}
| response_prompt
| chat
| StrOutputParser()
)
question = "What is the purpose of time series forecasting?"
answer = chain.invoke({"question": question})
print(answer)
HyDE
HyDE
同样是一种抽象的作法,通过生成一个假想的回答文档
,再用这个文档进行向量检索,从而提升检索召回的相关性和语义覆盖。
Fig .6 HyDE框架示意图
Hypoteetical Document
使用LLM
生成一段科技文章回答此问题,并根据回答进行向量检索,作为Context
传入LLM
。
template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""
prompt_hyde = ChatPromptTemplate.from_template(template)
generate_docs_for_retrieval = (
prompt_hyde | chat | StrOutputParser()
)
retrieval_chain = generate_docs_for_retrieval | retriever
Text Generation
# RAG
template = """Answer the following question based on this context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
final_rag_chain = (
prompt
| chat
| StrOutputParser()
)
question = "What is the purpose of time series forecasting?"
answer = final_rag_chain.invoke({
"context": retrieval_chain.invoke({"question": question}),
"question": question
})
print(answer)
Github
https://github.com/FranzLiszt-1847/LLM
参考内容
[1] https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb