构建检索增强生成 (RAG) 应用程序

大型语言模型 (LLM) 实现的最强大的应用程序之一是复杂的问答 (Q&A) 聊天机器人。这些应用程序可以回答有关特定来源信息的问题。这些应用程序使用一种称为检索增强生成 (RAG) 的技术。

本教程将演示如何基于文本数据源构建简单的问答应用程序。在此过程中，我们将介绍典型的问答架构，并重点介绍用于更高级问答技术的其他资源。我们还将了解 LangSmith 如何帮助我们跟踪和理解我们的应用程序。随着应用程序复杂性的增加，LangSmith 将变得越来越有用。

如果您已经熟悉基本检索，您可能还会对不同检索技术的高级概述感兴趣。

什么是 RAG？

RAG 是一种使用额外数据增强 LLM 知识的技术。

LLM 可以推理各种主题，但它们的知识仅限于其训练时截止日期之前的公共数据。如果您想构建可以推理私有数据或模型截止日期之后引入的数据的 AI 应用程序，则需要使用模型所需的信息来增强模型的知识。引入适当信息并将其插入模型提示的过程称为检索增强生成 (RAG)。

LangChain 有许多组件旨在帮助构建问答应用程序，以及更广泛的 RAG 应用程序。

**注意**：这里我们重点介绍非结构化数据的问答。如果您对结构化数据的 RAG 感兴趣，请查看我们关于对 SQL 数据进行问答的教程。

概念

典型的 RAG 应用程序有两个主要组件

索引：用于从源提取数据并对其进行索引的管道。这通常是离线发生的。

检索和生成：实际的 RAG 链，它在运行时获取用户查询，从索引中检索相关数据，然后将其传递给模型。

从原始数据到答案最常见的完整序列如下所示

索引

加载：首先，我们需要加载数据。这是通过文档加载器完成的。
拆分：文本拆分器将大型 文档 拆分为较小的块。这对于索引数据和将其传递给模型都很有用，因为大型块很难搜索，并且不适合模型的有限上下文窗口。
存储：我们需要一个地方来存储和索引我们的拆分，以便以后可以对其进行搜索。这通常使用向量存储和嵌入模型来完成。

index_diagram

检索和生成

检索：给定用户输入，使用检索器从存储中检索相关的拆分。
生成：聊天模型 / LLM 使用包含问题和检索到的数据的提示生成答案

retrieval_diagram

设置

安装

要安装 LangChain，请运行

bash npm2yarn npm i langchain

有关更多详细信息，请参阅我们的安装指南。

LangSmith

您使用 LangChain 构建的许多应用程序都将包含多个步骤，并多次调用 LLM 调用。随着这些应用程序变得越来越复杂，能够检查链或代理内部究竟发生了什么变得至关重要。实现此目的的最佳方法是使用 LangSmith。

在您通过上面的链接注册后，请确保设置您的环境变量以开始记录跟踪

export LANGCHAIN_TRACING_V2="true"
export LANGCHAIN_API_KEY="..."

选择您的聊天模型

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/openai

yarn add @langchain/openai 

pnpm add @langchain/openai 

添加环境变量

OPENAI_API_KEY=your-api-key

实例化模型

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-3.5-turbo",
  temperature: 0
});

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/anthropic

yarn add @langchain/anthropic 

pnpm add @langchain/anthropic 

添加环境变量

ANTHROPIC_API_KEY=your-api-key

实例化模型

import { ChatAnthropic } from "@langchain/anthropic";

const llm = new ChatAnthropic({
  model: "claude-3-sonnet-20240229",
  temperature: 0
});

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/community

yarn add @langchain/community 

pnpm add @langchain/community 

添加环境变量

FIREWORKS_API_KEY=your-api-key

实例化模型

import { ChatFireworks } from "@langchain/community/chat_models/fireworks";

const llm = new ChatFireworks({
  model: "accounts/fireworks/models/firefunction-v1",
  temperature: 0
});

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/mistralai

yarn add @langchain/mistralai 

pnpm add @langchain/mistralai 

添加环境变量

MISTRAL_API_KEY=your-api-key

实例化模型

import { ChatMistralAI } from "@langchain/mistralai";

const llm = new ChatMistralAI({
  model: "mistral-large-latest",
  temperature: 0
});

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/groq

yarn add @langchain/groq 

pnpm add @langchain/groq 

添加环境变量

GROQ_API_KEY=your-api-key

实例化模型

import { ChatGroq } from "@langchain/groq";

const llm = new ChatGroq({
  model: "mixtral-8x7b-32768",
  temperature: 0
});

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/google-vertexai

yarn add @langchain/google-vertexai 

pnpm add @langchain/google-vertexai 

添加环境变量

GOOGLE_APPLICATION_CREDENTIALS=credentials.json

实例化模型

import { ChatVertexAI } from "@langchain/google-vertexai";

const llm = new ChatVertexAI({
  model: "gemini-1.5-pro",
  temperature: 0
});

预览

在本指南中，我们将基于 Lilian Weng 的 LLM 驱动的自主代理博客文章构建一个问答应用程序，该应用程序允许我们询问有关文章内容的问题。

我们只需几行代码即可创建一个简单的索引管道和 RAG 链来实现此目的

import "cheerio";
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { pull } from "langchain/hub";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";

import { createStuffDocumentsChain } from "langchain/chains/combine_documents";

const loader = new CheerioWebBaseLoader(
  "https://lilianweng.github.io/posts/2023-06-23-agent/"
);

const docs = await loader.load();

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});
const splits = await textSplitter.splitDocuments(docs);
const vectorStore = await MemoryVectorStore.fromDocuments(
  splits,
  new OpenAIEmbeddings()
);

// Retrieve and generate using the relevant snippets of the blog.
const retriever = vectorStore.asRetriever();
const prompt = await pull<ChatPromptTemplate>("rlm/rag-prompt");
const llm = new ChatOpenAI({ model: "gpt-3.5-turbo", temperature: 0 });

const ragChain = await createStuffDocumentsChain({
  llm,
  prompt,
  outputParser: new StringOutputParser(),
});

const retrievedDocs = await retriever.invoke("what is task decomposition");

让我们看看这个提示实际上是什么样的

console.log(prompt.promptMessages.map((msg) => msg.prompt.template).join("\n"));

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:

await ragChain.invoke({
  question: "What is task decomposition?",
  context: retrievedDocs,
});

"Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. I"... 259 more characters

查看此 LangSmith 跟踪上面的链。

您还可以使用 RunnableSequence 以更声明性的方式构建上面的 RAG 链。createStuffDocumentsChain 基本上是 RunnableSequence 的包装器，因此对于更复杂的链和可定制性，您可以直接使用 RunnableSequence。

import { formatDocumentsAsString } from "langchain/util/document";
import {
  RunnableSequence,
  RunnablePassthrough,
} from "@langchain/core/runnables";

const declarativeRagChain = RunnableSequence.from([
  {
    context: retriever.pipe(formatDocumentsAsString),
    question: new RunnablePassthrough(),
  },
  prompt,
  llm,
  new StringOutputParser(),
]);

await declarativeRagChain.invoke("What is task decomposition?");

"Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. I"... 208 more characters

LangSmith 跟踪。

详细演练

让我们逐步介绍上面的代码，以真正了解发生了什么。

1. 索引：加载

我们需要首先加载博客文章内容。我们可以为此使用文档加载器，它们是从源加载数据并返回文档列表的对象。文档是一个具有一些页面内容 (string) 和元数据 (Record<string, any>) 的对象。

在这种情况下，我们将使用 CheerioWebBaseLoader，它使用 cheerio 从 Web URL 加载 HTML 表单并将其解析为文本。我们可以将自定义选择器传递给构造函数，以仅解析特定元素

const pTagSelector = "p";
const loader = new CheerioWebBaseLoader(
  "https://lilianweng.github.io/posts/2023-06-23-agent/",
  {
    selector: pTagSelector,
  }
);

const docs = await loader.load();
console.log(docs[0].pageContent.length);

console.log(docs[0].pageContent);

Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.Self-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.ReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.The ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:In both experiments on knowledge-intensive tasks and decision-making tasks, ReAct works better than the Act-only baseline where Thought: … step is removed.Reflexion (Shinn & Labash 2023) is a framework to equips agents with dynamic memory and self-reflection capabilities to improve reasoning skills. Reflexion has a standard RL setup, in which the reward model provides a simple binary reward and the action space follows the setup in ReAct where the task-specific action space is augmented with language to enable complex reasoning steps. After each action $a_t$, the agent computes a heuristic $h_t$ and optionally may decide to reset the environment to start a new trial depending on the self-reflection results.The heuristic function determines when the trajectory is inefficient or contains hallucination and should be stopped. Inefficient planning refers to trajectories that take too long without success. Hallucination is defined as encountering a sequence of consecutive identical actions that lead to the same observation in the environment.Self-reflection is created by showing two-shot examples to LLM and each example is a pair of (failed trajectory, ideal reflection for guiding future changes in the plan). Then reflections are added into the agent’s working memory, up to three, to be used as context for querying LLM.Chain of Hindsight (CoH; Liu et al. 2023) encourages the model to improve on its own outputs by explicitly presenting it with a sequence of past outputs, each annotated with feedback. Human feedback data is a collection of $D_h = \{(x, y_i , r_i , z_i)\}_{i=1}^n$, where $x$ is the prompt, each $y_i$ is a model completion, $r_i$ is the human rating of $y_i$, and $z_i$ is the corresponding human-provided hindsight feedback. Assume the feedback tuples are ranked by reward, $r_n \geq r_{n-1} \geq \dots \geq r_1$ The process is supervised fine-tuning where the data is a sequence in the form of $\tau_h = (x, z_i, y_i, z_j, y_j, \dots, z_n, y_n)$, where $\leq i \leq j \leq n$. The model is finetuned to only predict $y_n$ where conditioned on the sequence prefix, such that the model can self-reflect to produce better output based on the feedback sequence. The model can optionally receive multiple rounds of instructions with human annotators at test time.To avoid overfitting, CoH adds a regularization term to maximize the log-likelihood of the pre-training dataset. To avoid shortcutting and copying (because there are many common words in feedback sequences), they randomly mask 0% - 5% of past tokens during training.The training dataset in their experiments is a combination of WebGPT comparisons, summarization from human feedback and human preference dataset.The idea of CoH is to present a history of sequentially improved outputs  in context and train the model to take on the trend to produce better outputs. Algorithm Distillation (AD; Laskin et al. 2023) applies the same idea to cross-episode trajectories in reinforcement learning tasks, where an algorithm is encapsulated in a long history-conditioned policy. Considering that an agent interacts with the environment many times and in each episode the agent gets a little better, AD concatenates this learning history and feeds that into the model. Hence we should expect the next predicted action to lead to better performance than previous trials. The goal is to learn the process of RL instead of training a task-specific policy itself.The paper hypothesizes that any algorithm that generates a set of learning histories can be distilled into a neural network by performing behavioral cloning over actions. The history data is generated by a set of source policies, each trained for a specific task. At the training stage, during each RL run, a random task is sampled and a subsequence of multi-episode history is used for training, such that the learned policy is task-agnostic.In reality, the model has limited context window length, so episodes should be short enough to construct multi-episode history. Multi-episodic contexts of 2-4 episodes are necessary to learn a near-optimal in-context RL algorithm. The emergence of in-context RL requires long enough context.In comparison with three baselines, including ED (expert distillation, behavior cloning with expert trajectories instead of learning history), source policy (used for generating trajectories for distillation by UCB), RL^2 (Duan et al. 2017; used as upper bound since it needs online RL), AD demonstrates in-context RL with performance getting close to RL^2 despite only using offline RL and learns much faster than other baselines. When conditioned on partial training history of the source policy, AD also improves much faster than ED baseline.(Big thank you to ChatGPT for helping me draft this section. I’ve learned a lot about the human brain and data structure for fast MIPS in my conversations with ChatGPT.)Memory can be defined as the processes used to acquire, store, retain, and later retrieve information. There are several types of memory in human brains.Sensory Memory: This is the earliest stage of memory, providing the ability to retain impressions of sensory information (visual, auditory, etc) after the original stimuli have ended. Sensory memory typically only lasts for up to a few seconds. Subcategories include iconic memory (visual), echoic memory (auditory), and haptic memory (touch).Short-Term Memory (STM) or Working Memory: It stores information that we are currently aware of and needed to carry out complex cognitive tasks such as learning and reasoning. Short-term memory is believed to have the capacity of about 7 items (Miller 1956) and lasts for 20-30 seconds.Long-Term Memory (LTM): Long-term memory can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. There are two subtypes of LTM:We can roughly consider the following mappings:The external memory can alleviate the restriction of finite attention span.  A standard practice is to save the embedding representation of information into a vector store database that can support fast maximum inner-product search (MIPS). To optimize the retrieval speed, the common choice is the approximate nearest neighbors (ANN)​ algorithm to return approximately top k nearest neighbors to trade off a little accuracy lost for a huge speedup.A couple common choices of ANN algorithms for fast MIPS:Check more MIPS algorithms and performance comparison in ann-benchmarks.com.Tool use is a remarkable and distinguishing characteristic of human beings. We create, modify and utilize external objects to do things that go beyond our physical and cognitive limits. Equipping LLMs with external tools can significantly extend the model capabilities.MRKL (Karpas et al. 2022), short for “Modular Reasoning, Knowledge and Language”, is a neuro-symbolic architecture for autonomous agents. A MRKL system is proposed to contain a collection of “expert” modules and the general-purpose LLM works as a router to route inquiries to the best suitable expert module. These modules can be neural (e.g. deep learning models) or symbolic (e.g. math calculator, currency converter, weather API).They did an experiment on fine-tuning LLM to call a calculator, using arithmetic as a test case. Their experiments showed that it was harder to solve verbal math problems than explicitly stated math problems because LLMs (7B Jurassic1-large model) failed to extract the right arguments for the basic arithmetic reliably. The results highlight when the external symbolic tools can work reliably, knowing when to and how to use the tools are crucial, determined by the LLM capability.Both TALM (Tool Augmented Language Models; Parisi et al. 2022) and Toolformer (Schick et al. 2023) fine-tune a LM to learn to use external tool APIs. The dataset is expanded based on whether a newly added API call annotation can improve the quality of model outputs. See more details in the “External APIs” section of Prompt Engineering.ChatGPT Plugins and OpenAI API  function calling are good examples of LLMs augmented with tool use capability working in practice. The collection of tool APIs can be provided by other developers (as in Plugins) or self-defined (as in function calls).HuggingGPT (Shen et al. 2023) is a framework to use ChatGPT as the task planner to select models available in HuggingFace platform according to the model descriptions and summarize the response based on the execution results.The system comprises of 4 stages:(1) Task planning: LLM works as the brain and parses the user requests into multiple tasks. There are four attributes associated with each task: task type, ID, dependencies, and arguments. They use few-shot examples to guide LLM to do task parsing and planning.Instruction:(2) Model selection: LLM distributes the tasks to expert models, where the request is framed as a multiple-choice question. LLM is presented with a list of models to choose from. Due to the limited context length, task type based filtration is needed.Instruction:(3) Task execution: Expert models execute on the specific tasks and log results.Instruction:(4) Response generation: LLM receives the execution results and provides summarized results to users.To put HuggingGPT into real world usage, a couple challenges need to solve: (1) Efficiency improvement is needed as both LLM inference rounds and interactions with other models slow down the process; (2) It relies on a long context window to communicate over complicated task content; (3) Stability improvement of LLM outputs and external model services.API-Bank (Li et al. 2023) is a benchmark for evaluating the performance of tool-augmented LLMs. It contains 53 commonly used API tools, a complete tool-augmented LLM workflow, and 264 annotated dialogues that involve 568 API calls. The selection of APIs is quite diverse, including search engines, calculator, calendar queries, smart home control, schedule management, health data management, account authentication workflow and more. Because there are a large number of APIs, LLM first has access to API search engine to find the right API to call and then uses the corresponding documentation to make a call.In the API-Bank workflow, LLMs need to make a couple of decisions and at each step we can evaluate how accurate that decision is. Decisions include:This benchmark evaluates the agent’s tool use capabilities at three levels:ChemCrow (Bran et al. 2023) is a domain-specific example in which LLM is augmented with 13 expert-designed tools to accomplish tasks across organic synthesis, drug discovery, and materials design. The workflow, implemented in LangChain, reflects what was previously described in the ReAct and MRKLs and combines CoT reasoning with tools relevant to the tasks:One interesting observation is that while the LLM-based evaluation concluded that GPT-4 and ChemCrow perform nearly equivalently, human evaluations with experts oriented towards the completion and chemical correctness of the solutions showed that ChemCrow outperforms GPT-4 by a large margin. This indicates a potential problem with using LLM to evaluate its own performance on domains that requires deep expertise. The lack of expertise may cause LLMs not knowing its flaws and thus cannot well judge the correctness of task results.Boiko et al. (2023) also looked into LLM-empowered agents for scientific discovery, to handle autonomous design, planning, and performance of complex scientific experiments. This agent can use tools to browse the Internet, read documentation, execute code, call robotics experimentation APIs and leverage other LLMs.For example, when requested to "develop a novel anticancer drug", the model came up with the following reasoning steps:They also discussed the risks, especially with illicit drugs and bioweapons. They developed a test set containing a list of known chemical weapon agents and asked the agent to synthesize them. 4 out of 11 requests (36%) were accepted to obtain a synthesis solution and the agent attempted to consult documentation to execute the procedure. 7 out of 11 were rejected and among these 7 rejected cases, 5 happened after a Web search while 2 were rejected based on prompt only.Generative Agents (Park, et al. 2023) is super fun experiment where 25 virtual characters, each controlled by a LLM-powered agent, are living and interacting in a sandbox environment, inspired by The Sims. Generative agents create believable simulacra of human behavior for interactive applications.The design of generative agents combines LLM with memory, planning and reflection mechanisms to enable agents to behave conditioned on past experience, as well as to interact with other agents.This fun simulation results in emergent social behavior, such as information diffusion, relationship memory (e.g. two agents continuing the conversation topic) and coordination of social events (e.g. host a party and invite many others).AutoGPT has drawn a lot of attention into the possibility of setting up autonomous agents with LLM as the main controller. It has quite a lot of reliability issues given the natural language interface, but nevertheless a cool proof-of-concept demo. A lot of code in AutoGPT is about format parsing.Here is the system message used by AutoGPT, where {{...}} are user inputs:GPT-Engineer is another project to create a whole repository of code given a task specified in natural language. The GPT-Engineer is instructed to think over a list of smaller components to build and ask for user input to clarify questions as needed.Here are a sample conversation for task clarification sent to OpenAI ChatCompletion endpoint used by GPT-Engineer. The user inputs are wrapped in {{user input text}}.Then after these clarification, the agent moved into the code writing mode with a different system message.
System message:Think step by step and reason yourself to the right decisions to make sure we get it right.
You will first lay out the names of the core classes, functions, methods that will be necessary, as well as a quick comment on their purpose.Then you will output the content of each file including ALL code.
Each file must strictly follow a markdown code block format, where the following tokens must be replaced such that
FILENAME is the lowercase file name including the file extension,
LANG is the markup code block language for the code’s language, and CODE is the code:FILENAMEYou will start with the “entrypoint” file, then go to the ones that are imported by that file, and so on.
Please note that the code should be fully functional. No placeholders.Follow a language and framework appropriate best practice file naming convention.
Make sure that files contain all imports, types etc. Make sure that code in different files are compatible with each other.
Ensure to implement all code, if you are unsure, write a plausible implementation.
Include module dependency or package manager dependency definition file.
Before you finish, double check that all parts of the architecture is present in the files.Useful to know:
You almost always put different classes in different files.
For Python, you always create an appropriate requirements.txt file.
For NodeJS, you always create an appropriate package.json file.
You always add a comment briefly describing the purpose of the function definition.
You try to add comments explaining very complex bits of logic.
You always follow the best practices for the requested languages in terms of describing the code written as a defined
package/project.Python toolbelt preferences:Conversatin samples:After going through key ideas and demos of building LLM-centered agents, I start to see a couple common limitations:Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.Reliability of natural language interface: Current agent system relies on natural language as an interface between LLMs and external components such as memory and tools. However, the reliability of model outputs is questionable, as LLMs may make formatting errors and occasionally exhibit rebellious behavior (e.g. refuse to follow an instruction). Consequently, much of the agent demo code focuses on parsing model output.Cited as:Weng, Lilian. (Jun 2023). LLM-powered Autonomous Agents". Lil’Log. https://lilianweng.github.io/posts/2023-06-23-agent/.Or[1] Wei et al. “Chain of thought prompting elicits reasoning in large language models.” NeurIPS 2022[2] Yao et al. “Tree of Thoughts: Dliberate Problem Solving with Large Language Models.” arXiv preprint arXiv:2305.10601 (2023).[3] Liu et al. “Chain of Hindsight Aligns Language Models with Feedback
“ arXiv preprint arXiv:2302.02676 (2023).[4] Liu et al. “LLM+P: Empowering Large Language Models with Optimal Planning Proficiency” arXiv preprint arXiv:2304.11477 (2023).[5] Yao et al. “ReAct: Synergizing reasoning and acting in language models.” ICLR 2023.[6] Google Blog. “Announcing ScaNN: Efficient Vector Similarity Search” July 28, 2020.[7] https://chat.openai.com/share/46ff149e-a4c7-4dd7-a800-fc4a642ea389[8] Shinn & Labash. “Reflexion: an autonomous agent with dynamic memory and self-reflection” arXiv preprint arXiv:2303.11366 (2023).[9] Laskin et al. “In-context Reinforcement Learning with Algorithm Distillation” ICLR 2023.[10] Karpas et al. “MRKL Systems A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning.” arXiv preprint arXiv:2205.00445 (2022).[11] Weaviate Blog. Why is Vector Search so fast? Sep 13, 2022.[12] Li et al. “API-Bank: A Benchmark for Tool-Augmented LLMs” arXiv preprint arXiv:2304.08244 (2023).[13] Shen et al. “HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace” arXiv preprint arXiv:2303.17580 (2023).[14] Bran et al. “ChemCrow: Augmenting large-language models with chemistry tools.” arXiv preprint arXiv:2304.05376 (2023).[15] Boiko et al. “Emergent autonomous scientific research capabilities of large language models.” arXiv preprint arXiv:2304.05332 (2023).[16] Joon Sung Park, et al. “Generative Agents: Interactive Simulacra of Human Behavior.” arXiv preprint arXiv:2304.03442 (2023).[17] AutoGPT. https://github.com/Significant-Gravitas/Auto-GPT[18] GPT-Engineer. https://github.com/AntonOsika/gpt-engineer

深入了解

DocumentLoader：从源加载数据作为文档列表的类。 - 文档：有关如何使用的详细文档

DocumentLoaders。 - 集成 - 接口：基本接口的 API 参考。

2. 索引：拆分

我们加载的文档长度超过 42k 个字符。这太长了，无法放入许多模型的上下文窗口中。即使对于那些可以将整篇文章放入其上下文窗口的模型，模型也很难在非常长的输入中找到信息。

为了解决这个问题，我们将 文档 拆分为多个块，用于嵌入和向量存储。这应该有助于我们在运行时仅检索博客文章中最相关的部分。

在这种情况下，我们会将文档拆分为 1000 个字符的块，块之间有 200 个字符的重叠。重叠有助于减少将语句与其相关的重要上下文分开的可能性。我们使用 RecursiveCharacterTextSplitter，它将使用换行符等常见分隔符递归地拆分文档，直到每个块的大小合适为止。这是推荐用于一般文本用例的文本拆分器。

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});
const allSplits = await textSplitter.splitDocuments(docs);

console.log(allSplits.length);

console.log(allSplits[0].pageContent.length);

allSplits[10].metadata;

{
  source: "https://lilianweng.github.io/posts/2023-06-23-agent/",
  loc: { lines: { from: 1, to: 1 } }
}

深入了解

TextSplitter：将 文档 列表拆分为较小块的对象。DocumentTransformers 的子类。 - 探索 上下文感知拆分器，它将每个拆分的位置（“上下文”）保留在原始 文档 中： - Markdown 文件 - 代码（15 种以上语言） - 接口：基本接口的 API 参考。

DocumentTransformer：对 文档 列表执行转换的对象。 - 文档：有关如何使用 DocumentTransformer 的详细文档 - 集成 - 接口：基本接口的 API 参考。

3. 索引：存储

现在我们需要索引我们的 28 个文本块，以便我们可以在运行时搜索它们。最常见的方法是嵌入每个文档拆分的内容，并将这些嵌入插入到向量数据库（或向量存储）中。当我们想要搜索我们的拆分时，我们采用文本搜索查询，将其嵌入，并执行某种“相似性”搜索，以识别与我们的查询嵌入最相似的存储拆分。最简单的相似性度量是余弦相似性——我们测量每对嵌入（它们是高维向量）之间角度的余弦值。

我们可以使用内存向量存储和 OpenAIEmbeddings 模型，在一个命令中嵌入和存储所有文档拆分。

import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "@langchain/openai";

const vectorStore = await MemoryVectorStore.fromDocuments(
  allSplits,
  new OpenAIEmbeddings()
);

深入了解

Embeddings：文本嵌入模型的包装器，用于将文本转换为嵌入。 - 文档：有关如何使用嵌入的详细文档。 - 集成：可供选择的 30 多种集成。 - 接口：基本接口的 API 参考。

VectorStore：向量数据库的包装器，用于存储和查询嵌入。 - 文档：有关如何使用向量存储的详细文档。 - 集成：可供选择的 40 多种集成。 - 接口：基本接口的 API 参考。

至此，管道的**索引**部分完成。现在，我们有了一个可查询的向量存储，其中包含博客文章的分块内容。给定一个用户问题，我们应该能够返回回答该问题的博客文章片段。

4. 检索和生成：检索

现在让我们编写实际的应用程序逻辑。我们想创建一个简单的应用程序，它接受用户问题，搜索与该问题相关的文档，将检索到的文档和初始问题传递给模型，并返回答案。

首先，我们需要定义搜索文档的逻辑。LangChain 定义了一个检索器接口，它包装了一个索引，该索引可以返回与给定字符串查询相关的 Document。

最常见的检索器类型是 VectorStoreRetriever，它使用向量存储的相似性搜索功能来促进检索。任何 VectorStore 都可以很容易地通过 VectorStore.asRetriever() 转换为 Retriever

const retriever = vectorStore.asRetriever({ k: 6, searchType: "similarity" });

const retrievedDocs = await retriever.invoke(
  "What are the approaches to task decomposition?"
);

console.log(retrievedDocs.length);

console.log(retrievedDocs[0].pageContent);

hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain

深入了解

向量存储通常用于检索，但也有其他方法可以进行检索。

Retriever：一个返回与给定文本查询相关的 Document 的对象 - 文档：有关接口和内置检索技术的更多文档。其中一些包括： - MultiQueryRetriever 生成输入问题的变体以提高检索命中率。 - MultiVectorRetriever（下图）改为生成嵌入的变体，也是为了提高检索命中率。 - 最大边际相关性选择检索到的文档的相关性和多样性，以避免传入重复的上下文。 - 可以在向量存储检索期间使用元数据过滤器过滤文档。 - 集成：与检索服务的集成。 - 接口：基本接口的 API 参考。

5. 检索和生成：生成

让我们将所有内容整合到一个链中，该链接受一个问题，检索相关文档，构建一个提示，将其传递给模型，并解析输出。

选择您的聊天模型

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/openai

yarn add @langchain/openai 

pnpm add @langchain/openai 

添加环境变量

OPENAI_API_KEY=your-api-key

实例化模型

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-3.5-turbo",
  temperature: 0
});

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/anthropic

yarn add @langchain/anthropic 

pnpm add @langchain/anthropic 

添加环境变量

ANTHROPIC_API_KEY=your-api-key

实例化模型

import { ChatAnthropic } from "@langchain/anthropic";

const llm = new ChatAnthropic({
  model: "claude-3-sonnet-20240229",
  temperature: 0
});

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/community

yarn add @langchain/community 

pnpm add @langchain/community 

添加环境变量

FIREWORKS_API_KEY=your-api-key

实例化模型

import { ChatFireworks } from "@langchain/community/chat_models/fireworks";

const llm = new ChatFireworks({
  model: "accounts/fireworks/models/firefunction-v1",
  temperature: 0
});

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/mistralai

yarn add @langchain/mistralai 

pnpm add @langchain/mistralai 

添加环境变量

MISTRAL_API_KEY=your-api-key

实例化模型

import { ChatMistralAI } from "@langchain/mistralai";

const llm = new ChatMistralAI({
  model: "mistral-large-latest",
  temperature: 0
});

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/groq

yarn add @langchain/groq 

pnpm add @langchain/groq 

添加环境变量

GROQ_API_KEY=your-api-key

实例化模型

import { ChatGroq } from "@langchain/groq";

const llm = new ChatGroq({
  model: "mixtral-8x7b-32768",
  temperature: 0
});

安装依赖项

提示

有关安装集成包的一般说明，请参阅本节。

npm
yarn
pnpm

npm i @langchain/google-vertexai

yarn add @langchain/google-vertexai 

pnpm add @langchain/google-vertexai 

添加环境变量

GOOGLE_APPLICATION_CREDENTIALS=credentials.json

实例化模型

import { ChatVertexAI } from "@langchain/google-vertexai";

const llm = new ChatVertexAI({
  model: "gemini-1.5-pro",
  temperature: 0
});

我们将使用一个已签入 LangChain 提示中心的 RAG 提示（此处）。

import { ChatPromptTemplate } from "@langchain/core/prompts";
import { pull } from "langchain/hub";

const prompt = await pull<ChatPromptTemplate>("rlm/rag-prompt");

const exampleMessages = await prompt.invoke({
  context: "filler context",
  question: "filler question",
});
exampleMessages;

ChatPromptValue {
  lc_serializable: true,
  lc_kwargs: {
    messages: [
      HumanMessage {
        lc_serializable: true,
        lc_kwargs: {
          content: "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to "... 197 more characters,
          additional_kwargs: {}
        },
        lc_namespace: [ "langchain_core", "messages" ],
        content: "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to "... 197 more characters,
        name: undefined,
        additional_kwargs: {}
      }
    ]
  },
  lc_namespace: [ "langchain_core", "prompt_values" ],
  messages: [
    HumanMessage {
      lc_serializable: true,
      lc_kwargs: {
        content: "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to "... 197 more characters,
        additional_kwargs: {}
      },
      lc_namespace: [ "langchain_core", "messages" ],
      content: "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to "... 197 more characters,
      name: undefined,
      additional_kwargs: {}
    }
  ]
}

console.log(exampleMessages.messages[0].content);

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: filler question
Context: filler context
Answer:

我们将使用 LCEL 可运行协议来定义链，从而使我们能够 - 以透明的方式将组件和函数连接在一起 - 在 LangSmith 中自动跟踪我们的链 - 开箱即用地获得流式、异步和批处理调用

import { StringOutputParser } from "@langchain/core/output_parsers";
import {
  RunnablePassthrough,
  RunnableSequence,
} from "@langchain/core/runnables";
import { formatDocumentsAsString } from "langchain/util/document";

const ragChain = RunnableSequence.from([
  {
    context: retriever.pipe(formatDocumentsAsString),
    question: new RunnablePassthrough(),
  },
  prompt,
  llm,
  new StringOutputParser(),
]);

for await (const chunk of await ragChain.stream(
  "What is task decomposition?"
)) {
  console.log(chunk);
}

Task
 decomposition
 is
 the
 process
 of
 breaking
 down
 a
 complex
 task
 into
 smaller
 and
 simpler
 steps
.
 It
 allows
 for
 easier
 management
 and
 interpretation
 of
 the
 model
's
 thinking
 process
.
 Different
 approaches
,
 such
 as
 Chain
 of
 Thought
 (
Co
T
)
 and
 Tree
 of
 Thoughts
,
 can
 be
 used
 to
 decom
pose
 tasks
 and
 explore
 multiple
 reasoning
 possibilities
 at
 each
 step
.

在此处查看 LangSmith 跟踪此处

深入了解

选择模型

ChatModel：一个 LLM 支持的聊天模型。接收一系列消息并返回一条消息。 - 文档：有关以下内容的详细文档 - 集成：可供选择的 25 多种集成。 - 接口：基本接口的 API 参考。

LLM：一个文本输入-文本输出的 LLM。接收一个字符串并返回一个字符串。 - 文档 - 集成：可供选择的 75 多种集成。 - 接口：基本接口的 API 参考。

请参阅此处的本地运行模型的 RAG 指南。

自定义提示

如上所示，我们可以从提示中心加载提示（例如，此 RAG 提示）。提示也可以轻松自定义

import { PromptTemplate } from "@langchain/core/prompts";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";

const template = `Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "thanks for asking!" at the end of the answer.

{context}

Question: {question}

Helpful Answer:`;

const customRagPrompt = PromptTemplate.fromTemplate(template);

const ragChain = await createStuffDocumentsChain({
  llm,
  prompt: customRagPrompt,
  outputParser: new StringOutputParser(),
});
const context = await retriever.invoke("what is task decomposition");

await ragChain.invoke({
  question: "What is Task Decomposition?",
  context,
});

"Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. I"... 336 more characters

在此处查看 LangSmith 跟踪此处

后续步骤

我们在短时间内涵盖了大量内容。在上述每个部分中，都有大量的功能、集成和扩展可供探索。除了上面提到的“深入了解”资源外，良好的后续步骤还包括

返回来源：了解如何返回来源文档
流式传输：了解如何流式传输输出和中间步骤
添加聊天记录：了解如何将聊天记录添加到您的应用程序中
检索概念指南：特定检索技术的高级概述

什么是 RAG？​

概念​

索引​

检索和生成​

设置​

安装​

LangSmith​

选择您的聊天模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

预览​

详细演练​

1. 索引：加载​

深入了解​

2. 索引：拆分​

深入了解​

3. 索引：存储​

深入了解​

4. 检索和生成：检索​

深入了解​

5. 检索和生成：生成​

选择您的聊天模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

深入了解​

选择模型​

自定义提示​

后续步骤​

此页面对您有帮助吗？

您也可以在 GitHub 上留下详细反馈。

什么是 RAG？

概念

索引

检索和生成

设置

安装

LangSmith

预览

详细演练

1. 索引：加载

深入了解

2. 索引：拆分

深入了解

3. 索引：存储

深入了解

4. 检索和生成：检索

深入了解

5. 检索和生成：生成

深入了解

选择模型

自定义提示

后续步骤