构建本地 RAG 应用程序

像PrivateGPT、llama.cpp、GPT4All 和 llamafile 这样的项目普及程度凸显了在本地运行 LLM 的重要性。

LangChain 与许多可以在本地运行的开源 LLM 集成。

例如，这里展示了如何使用本地嵌入和本地 LLM 在本地（例如，在你的笔记本电脑上）运行 OllamaEmbeddings 或 LLaMA2。

文档加载

首先，安装本地嵌入和向量存储所需的软件包。

设置

依赖项

我们将使用以下软件包

npm install --save langchain @langchain/community cheerio

LangSmith

你使用 LangChain 构建的许多应用程序将包含多个步骤，其中包含对 LLM 调用的多次调用。随着这些应用程序越来越复杂，能够检查你的链或代理内部到底发生了什么变得至关重要。使用 LangSmith 是最好的方法。

请注意，LangSmith 不是必需的，但它很有用。如果你确实想使用 LangSmith，在你在上面的链接中注册后，请确保设置你的环境变量以开始记录跟踪

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=YOUR_KEY

# Reduce tracing latency if you are not in a serverless environment
# export LANGCHAIN_CALLBACKS_BACKGROUND=true

初始设置

加载并拆分示例文档。

我们将使用一篇关于代理的博文作为示例。

import "cheerio";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";

const loader = new CheerioWebBaseLoader(
  "https://lilianweng.github.io/posts/2023-06-23-agent/"
);
const docs = await loader.load();

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 500,
  chunkOverlap: 0,
});
const allSplits = await textSplitter.splitDocuments(docs);
console.log(allSplits.length);

接下来，我们将使用 OllamaEmbeddings 作为本地嵌入。请遵循这些说明来设置并运行本地 Ollama 实例。

import { OllamaEmbeddings } from "@langchain/community/embeddings/ollama";
import { MemoryVectorStore } from "langchain/vectorstores/memory";

const embeddings = new OllamaEmbeddings();
const vectorStore = await MemoryVectorStore.fromDocuments(
  allSplits,
  embeddings
);

测试相似度搜索是否可以使用我们的本地嵌入正常工作。

const question = "What are the approaches to Task Decomposition?";
const docs = await vectorStore.similaritySearch(question);
console.log(docs.length);

模型

LLaMA2

对于本地 LLM，我们也将使用 ollama。

import { ChatOllama } from "@langchain/ollama";

const ollamaLlm = new ChatOllama({
  baseUrl: "https://127.0.0.1:11434", // Default value
  model: "llama2", // Default value
});

const response = await ollamaLlm.invoke(
  "Simulate a rap battle between Stephen Colbert and John Oliver"
);
console.log(response.content);

[The stage is set for a fierce rap battle between two of the funniest men on television. Stephen Colbert and John Oliver are standing face to face, each with their own microphone and confident smirk on their face.]

Stephen Colbert:
Yo, John Oliver, I heard you've been talking smack
About my show and my satire, saying it's all fake
But let me tell you something, brother, I'm the real deal
I've been making fun of politicians for years, with no conceal

John Oliver:
Oh, Stephen, you think you're so clever and smart
But your jokes are stale and your delivery's a work of art
You're just a pale imitation of the real deal, Jon Stewart
I'm the one who's really making waves, while you're just a little bird

Stephen Colbert:
Well, John, I may not be as loud as you, but I'm smarter
My satire is more subtle, and it goes right over their heads
I'm the one who's been exposing the truth for years
While you're just a British interloper, trying to steal the cheers

John Oliver:
Oh, Stephen, you may have your fans, but I've got the brains
My show is more than just slapstick and silly jokes, it's got depth and gains
I'm the one who's really making a difference, while you're just a clown
My satire is more than just a joke, it's a call to action, and I've got the crown

[The crowd cheers and chants as the two comedians continue their rap battle.]

Stephen Colbert:
You may have your fans, John, but I'm the king of satire
I've been making fun of politicians for years, and I'm still standing tall
My jokes are clever and smart, while yours are just plain dumb
I'm the one who's really in control, and you're just a pretender to the throne.

John Oliver:
Oh, Stephen, you may have your moment in the sun
But I'm the one who's really shining bright, and my star is just beginning to rise
My satire is more than just a joke, it's a call to action, and I've got the power
I'm the one who's really making a difference, and you're just a fleeting flower.

[The crowd continues to cheer and chant as the two comedians continue their rap battle.]

查看 LangSmith 跟踪此处

在链中使用

我们可以通过传入检索到的文档和一个简单的提示来创建带有任何模型的摘要链。

它使用提供的输入键值格式化提示模板，并将格式化的字符串传递给 LLama-V2 或其他指定的 LLM。

import { RunnableSequence } from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { PromptTemplate } from "@langchain/core/prompts";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";

const prompt = PromptTemplate.fromTemplate(
  "Summarize the main themes in these retrieved docs: {context}"
);

const chain = await createStuffDocumentsChain({
  llm: ollamaLlm,
  outputParser: new StringOutputParser(),
  prompt,
});

const question = "What are the approaches to Task Decomposition?";
const docs = await vectorStore.similaritySearch(question);
await chain.invoke({
  context: docs,
});

"The main themes retrieved from the provided documents are:\n" +
  "\n" +
  "1. Sensory Memory: The ability to retain"... 1117 more characters

查看 LangSmith 跟踪此处

问答

我们还可以使用 LangChain 提示中心来存储和获取特定于模型的提示。

让我们尝试使用默认的 RAG 提示，此处。

import { pull } from "langchain/hub";
import { ChatPromptTemplate } from "@langchain/core/prompts";

const ragPrompt = await pull<ChatPromptTemplate>("rlm/rag-prompt");

const chain = await createStuffDocumentsChain({
  llm: ollamaLlm,
  outputParser: new StringOutputParser(),
  prompt: ragPrompt,
});

让我们看看这个提示实际上是什么样子的

console.log(
  ragPrompt.promptMessages.map((msg) => msg.prompt.template).join("\n")
);

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:

await chain.invoke({ context: docs, question });

"Task decomposition is a crucial step in breaking down complex problems into manageable parts for eff"... 1095 more characters

查看 LangSmith 跟踪此处

带检索的问答

我们可以根据用户问题自动从向量存储中检索文档，而不是手动传递文档。

这将使用默认的 QA 提示，并将从 vectorDB 中检索。

import {
  RunnablePassthrough,
  RunnableSequence,
} from "@langchain/core/runnables";
import { formatDocumentsAsString } from "langchain/util/document";

const retriever = vectorStore.asRetriever();

const qaChain = RunnableSequence.from([
  {
    context: (input: { question: string }, callbacks) => {
      const retrieverAndFormatter = retriever.pipe(formatDocumentsAsString);
      return retrieverAndFormatter.invoke(input.question, callbacks);
    },
    question: new RunnablePassthrough(),
  },
  ragPrompt,
  ollamaLlm,
  new StringOutputParser(),
]);

await qaChain.invoke({ question });

"Based on the context provided, I understand that you are asking me to answer a question related to m"... 948 more characters

查看 LangSmith 跟踪此处

构建本地 RAG 应用程序

文档加载

设置

依赖项

LangSmith

初始设置

模型

LLaMA2

在链中使用

问答

带检索的问答

此页面对您有帮助吗？

您也可以在 GitHub 上留下详细的反馈 on GitHub.

文档加载​

设置​

依赖项​

LangSmith​

初始设置​

模型​

LLaMA2​

在链中使用​

问答​

带检索的问答​

此页面对您有帮助吗？

您也可以在 GitHub 上留下详细的反馈 on GitHub.

文档加载

设置

依赖项

LangSmith

初始设置

模型

LLaMA2

在链中使用

问答

带检索的问答