构建本地 RAG 应用程序

像 PrivateGPT、llama.cpp、GPT4All 和 llamafile 这样的项目的流行程度突出了在本地运行 LLM 的重要性。

LangChain 与许多可以在本地运行的开源 LLM 集成。

例如，这里展示了如何使用本地嵌入和本地 LLM 在本地（例如，在您的笔记本电脑上）运行 OllamaEmbeddings 或 LLaMA2。

文档加载

首先，安装本地嵌入和向量存储所需的软件包。

设置

依赖项

我们将使用以下软件包

npm install --save langchain @langchain/community cheerio

LangSmith

您使用 LangChain 构建的许多应用程序将包含多个步骤，以及对 LLM 调用的多次调用。当这些应用程序变得越来越复杂时，能够检查链或代理内部到底发生了什么变得至关重要。最好的方法是使用 LangSmith。

请注意，LangSmith 不是必需的，但它很有帮助。如果您确实想使用 LangSmith，在您在上面的链接中注册后，请确保设置您的环境变量以开始记录跟踪

export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=YOUR_KEY

# Reduce tracing latency if you are not in a serverless environment
# export LANGCHAIN_CALLBACKS_BACKGROUND=true

初始设置

加载并拆分示例文档。

我们将使用一篇关于代理的博文作为示例。

import "cheerio";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";

const loader = new CheerioWebBaseLoader(
  "https://lilianweng.github.io/posts/2023-06-23-agent/"
);
const docs = await loader.load();

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 500,
  chunkOverlap: 0,
});
const allSplits = await textSplitter.splitDocuments(docs);
console.log(allSplits.length);

接下来，我们将使用 OllamaEmbeddings 进行本地嵌入。请按照这些说明设置并运行本地 Ollama 实例。

import { OllamaEmbeddings } from "@langchain/community/embeddings/ollama";
import { MemoryVectorStore } from "langchain/vectorstores/memory";

const embeddings = new OllamaEmbeddings();
const vectorStore = await MemoryVectorStore.fromDocuments(
  allSplits,
  embeddings
);

测试相似性搜索是否可以使用我们的本地嵌入工作。

const question = "What are the approaches to Task Decomposition?";
const docs = await vectorStore.similaritySearch(question);
console.log(docs.length);

模型

LLaMA2

对于本地 LLM，我们也将使用 ollama。

import { ChatOllama } from "@langchain/ollama";

const ollamaLlm = new ChatOllama({
  baseUrl: "http://localhost:11434", // Default value
  model: "llama2", // Default value
});

const response = await ollamaLlm.invoke(
  "Simulate a rap battle between Stephen Colbert and John Oliver"
);
console.log(response.content);

[The stage is set for a fierce rap battle between two of the funniest men on television. Stephen Colbert and John Oliver are standing face to face, each with their own microphone and confident smirk on their face.]

Stephen Colbert:
Yo, John Oliver, I heard you've been talking smack
About my show and my satire, saying it's all fake
But let me tell you something, brother, I'm the real deal
I've been making fun of politicians for years, with no conceal

John Oliver:
Oh, Stephen, you think you're so clever and smart
But your jokes are stale and your delivery's a work of art
You're just a pale imitation of the real deal, Jon Stewart
I'm the one who's really making waves, while you're just a little bird

Stephen Colbert:
Well, John, I may not be as loud as you, but I'm smarter
My satire is more subtle, and it goes right over their heads
I'm the one who's been exposing the truth for years
While you're just a British interloper, trying to steal the cheers

John Oliver:
Oh, Stephen, you may have your fans, but I've got the brains
My show is more than just slapstick and silly jokes, it's got depth and gains
I'm the one who's really making a difference, while you're just a clown
My satire is more than just a joke, it's a call to action, and I've got the crown

[The crowd cheers and chants as the two comedians continue their rap battle.]

Stephen Colbert:
You may have your fans, John, but I'm the king of satire
I've been making fun of politicians for years, and I'm still standing tall
My jokes are clever and smart, while yours are just plain dumb
I'm the one who's really in control, and you're just a pretender to the throne.

John Oliver:
Oh, Stephen, you may have your moment in the sun
But I'm the one who's really shining bright, and my star is just beginning to rise
My satire is more than just a joke, it's a call to action, and I've got the power
I'm the one who's really making a difference, and you're just a fleeting flower.

[The crowd continues to cheer and chant as the two comedians continue their rap battle.]

请参阅 LangSmith 跟踪这里

在链中使用

我们可以通过传入检索到的文档和一个简单的提示，为这两个模型中的任何一个创建摘要链。

它使用提供的输入键值格式化提示模板，并将格式化的字符串传递到 LLama-V2 或其他指定的 LLM。

import { RunnableSequence } from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { PromptTemplate } from "@langchain/core/prompts";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";

const prompt = PromptTemplate.fromTemplate(
  "Summarize the main themes in these retrieved docs: {context}"
);

const chain = await createStuffDocumentsChain({
  llm: ollamaLlm,
  outputParser: new StringOutputParser(),
  prompt,
});

const question = "What are the approaches to Task Decomposition?";
const docs = await vectorStore.similaritySearch(question);
await chain.invoke({
  context: docs,
});

"The main themes retrieved from the provided documents are:\n" +
  "\n" +
  "1. Sensory Memory: The ability to retain"... 1117 more characters

请参阅 LangSmith 跟踪这里

问答

我们还可以使用 LangChain 提示中心来存储和获取特定于模型的提示。

让我们尝试一个默认的 RAG 提示，这里。

import { pull } from "langchain/hub";
import { ChatPromptTemplate } from "@langchain/core/prompts";

const ragPrompt = await pull<ChatPromptTemplate>("rlm/rag-prompt");

const chain = await createStuffDocumentsChain({
  llm: ollamaLlm,
  outputParser: new StringOutputParser(),
  prompt: ragPrompt,
});

让我们看看这个提示到底是什么样的

console.log(
  ragPrompt.promptMessages.map((msg) => msg.prompt.template).join("\n")
);

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:

await chain.invoke({ context: docs, question });

"Task decomposition is a crucial step in breaking down complex problems into manageable parts for eff"... 1095 more characters

请参阅 LangSmith 跟踪这里

带有检索的问答

我们无需手动传入文档，而是可以根据用户的问题从我们的向量存储中自动检索文档。

这将使用 QA 默认提示，并将从 vectorDB 中检索。

import {
  RunnablePassthrough,
  RunnableSequence,
} from "@langchain/core/runnables";
import { formatDocumentsAsString } from "langchain/util/document";

const retriever = vectorStore.asRetriever();

const qaChain = RunnableSequence.from([
  {
    context: (input: { question: string }, callbacks) => {
      const retrieverAndFormatter = retriever.pipe(formatDocumentsAsString);
      return retrieverAndFormatter.invoke(input.question, callbacks);
    },
    question: new RunnablePassthrough(),
  },
  ragPrompt,
  ollamaLlm,
  new StringOutputParser(),
]);

await qaChain.invoke({ question });

"Based on the context provided, I understand that you are asking me to answer a question related to m"... 948 more characters

请查看 LangSmith 跟踪此处

构建本地 RAG 应用程序

文档加载

设置

依赖项

LangSmith

初始设置

模型

LLaMA2

在链中使用

问答

带有检索的问答

此页面对您有帮助吗？

您也可以留下详细的反馈在 GitHub 上.

文档加载​

设置​

依赖项​

LangSmith​

初始设置​

模型​

LLaMA2​

在链中使用​

问答​

带有检索的问答​

此页面对您有帮助吗？

您也可以留下详细的反馈 在 GitHub 上.

文档加载

设置

依赖项

LangSmith

初始设置

模型

LLaMA2

在链中使用

问答

带有检索的问答

您也可以留下详细的反馈在 GitHub 上.