对话式 RAG
在许多问答应用程序中,我们希望允许用户进行来回对话,这意味着应用程序需要某种“记忆”来记住过去的问题和答案,以及一些逻辑来将这些信息纳入其当前的思考中。
在本指南中,我们重点关注添加逻辑以合并历史消息。有关聊天历史记录管理的更多详细信息,请参阅此处。
我们将介绍两种方法
- 链,我们始终执行检索步骤;
- 代理,我们赋予 LLM 是否以及如何执行检索步骤(或多个步骤)的自由裁量权。
对于外部知识源,我们将使用 Lilian Weng 在LLM 驱动的自主代理 博文中提到的相同博客文章,该博客文章来自RAG 教程。
设置
依赖项
在本演练中,我们将使用 OpenAI 聊天模型和嵌入以及 Memory 向量存储,但此处显示的所有内容都适用于任何ChatModel 或LLM,嵌入 以及VectorStore 或Retriever。
我们将使用以下软件包
npm install --save langchain @langchain/openai cheerio
我们需要设置环境变量 OPENAI_API_KEY
export OPENAI_API_KEY=YOUR_KEY
LangSmith
使用 LangChain 构建的许多应用程序将包含多个步骤,并多次调用 LLM。随着这些应用程序变得越来越复杂,能够检查你的链或代理内部到底发生了什么变得至关重要。最好的方法是使用LangSmith。
请注意,LangSmith 不是必需的,但它很有帮助。如果你确实想使用 LangSmith,在上面的链接注册后,确保设置你的环境变量以开始记录跟踪
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=YOUR_KEY
# Reduce tracing latency if you are not in a serverless environment
# export LANGCHAIN_CALLBACKS_BACKGROUND=true
初始设置
import "cheerio";
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { pull } from "langchain/hub";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import {
RunnableSequence,
RunnablePassthrough,
} from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
const loader = new CheerioWebBaseLoader(
"https://lilianweng.github.io/posts/2023-06-23-agent/"
);
const docs = await loader.load();
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const splits = await textSplitter.splitDocuments(docs);
const vectorStore = await MemoryVectorStore.fromDocuments(
splits,
new OpenAIEmbeddings()
);
// Retrieve and generate using the relevant snippets of the blog.
const retriever = vectorStore.asRetriever();
const prompt = await pull<ChatPromptTemplate>("rlm/rag-prompt");
const llm = new ChatOpenAI({ model: "gpt-3.5-turbo", temperature: 0 });
const ragChain = await createStuffDocumentsChain({
llm,
prompt,
outputParser: new StringOutputParser(),
});
让我们看看这个提示实际上是什么样的
console.log(prompt.promptMessages.map((msg) => msg.prompt.template).join("\n"));
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
await ragChain.invoke({
context: await retriever.invoke("What is Task Decomposition?"),
question: "What is Task Decomposition?",
});
"Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. I"... 208 more characters
将问题置于上下文中
首先,我们需要定义一个子链,该子链接收历史消息和最新的用户问题,并在引用历史信息中的任何信息时重新措辞问题。
我们将使用一个提示,该提示在名为“chat_history”的变量下包含一个 MessagesPlaceholder
变量。这允许我们使用“chat_history”输入键将消息列表传递给提示,这些消息将在系统消息之后以及包含最新问题的用户消息之前插入。
import {
ChatPromptTemplate,
MessagesPlaceholder,
} from "@langchain/core/prompts";
const contextualizeQSystemPrompt = `Given a chat history and the latest user question
which might reference context in the chat history, formulate a standalone question
which can be understood without the chat history. Do NOT answer the question,
just reformulate it if needed and otherwise return it as is.`;
const contextualizeQPrompt = ChatPromptTemplate.fromMessages([
["system", contextualizeQSystemPrompt],
new MessagesPlaceholder("chat_history"),
["human", "{question}"],
]);
const contextualizeQChain = contextualizeQPrompt
.pipe(llm)
.pipe(new StringOutputParser());
使用此链,我们可以提出引用过去消息的后续问题,并将其重新措辞为独立的问题
import { AIMessage, HumanMessage } from "@langchain/core/messages";
await contextualizeQChain.invoke({
chat_history: [
new HumanMessage("What does LLM stand for?"),
new AIMessage("Large language model"),
],
question: "What is meant by large",
});
'What is the definition of "large" in the context of a language model?'
具有聊天历史记录的链
现在我们可以构建完整的 QA 链。
请注意,我们添加了一些路由功能,仅当我们的聊天历史记录不为空时才运行“压缩问题链”。在这里,我们利用了这样一个事实:如果 LCEL 链中的函数返回另一个链,那么该链本身将被调用。
import {
ChatPromptTemplate,
MessagesPlaceholder,
} from "@langchain/core/prompts";
import {
RunnablePassthrough,
RunnableSequence,
} from "@langchain/core/runnables";
import { formatDocumentsAsString } from "langchain/util/document";
const qaSystemPrompt = `You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that you don't know.
Use three sentences maximum and keep the answer concise.
{context}`;
const qaPrompt = ChatPromptTemplate.fromMessages([
["system", qaSystemPrompt],
new MessagesPlaceholder("chat_history"),
["human", "{question}"],
]);
const contextualizedQuestion = (input: Record<string, unknown>) => {
if ("chat_history" in input) {
return contextualizeQChain;
}
return input.question;
};
const ragChain = RunnableSequence.from([
RunnablePassthrough.assign({
context: (input: Record<string, unknown>) => {
if ("chat_history" in input) {
const chain = contextualizedQuestion(input);
return chain.pipe(retriever).pipe(formatDocumentsAsString);
}
return "";
},
}),
qaPrompt,
llm,
]);
let chat_history = [];
const question = "What is task decomposition?";
const aiMsg = await ragChain.invoke({ question, chat_history });
console.log(aiMsg);
chat_history = chat_history.concat(aiMsg);
const secondQuestion = "What are common ways of doing it?";
await ragChain.invoke({ question: secondQuestion, chat_history });
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: "Task decomposition is a technique used to break down complex tasks into smaller and more manageable "... 278 more characters,
additional_kwargs: { function_call: undefined, tool_calls: undefined }
},
lc_namespace: [ "langchain_core", "messages" ],
content: "Task decomposition is a technique used to break down complex tasks into smaller and more manageable "... 278 more characters,
name: undefined,
additional_kwargs: { function_call: undefined, tool_calls: undefined }
}
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: "Common ways of task decomposition include using prompting techniques like Chain of Thought (CoT) or "... 332 more characters,
additional_kwargs: { function_call: undefined, tool_calls: undefined }
},
lc_namespace: [ "langchain_core", "messages" ],
content: "Common ways of task decomposition include using prompting techniques like Chain of Thought (CoT) or "... 332 more characters,
name: undefined,
additional_kwargs: { function_call: undefined, tool_calls: undefined }
}
请参阅此处 的第一个 LastSmith 跟踪和此处 的第二个跟踪
我们已经讨论了如何添加应用程序逻辑来合并历史输出,但我们仍然手动更新聊天历史并将其插入到每个输入中。在真正的问答应用程序中,我们需要一种持久化聊天历史的方法,以及一种自动插入和更新聊天历史的方法。
为此,我们可以使用
- BaseChatMessageHistory: 用于存储聊天历史。
- RunnableWithMessageHistory: 用于 LCEL 链和
BaseChatMessageHistory
的包装器,负责将聊天历史注入输入并在每次调用后更新聊天历史。
有关如何将这些类一起使用来创建有状态对话链的详细演练,请访问 如何在 LCEL 页面中添加消息历史(内存)。