如何使用向量存储来检索数据

先决条件

本指南假定您熟悉以下概念

向量存储可以使用 .asRetriever() 方法转换为检索器，这使您可以更轻松地在链中组合它们。

下面，我们展示了一个检索增强生成 (RAG) 链，该链使用以下步骤对文档执行问答

初始化向量存储
从该向量存储创建检索器
组合问答链
提问！

每个步骤都有多个子步骤和潜在配置，但我们将介绍一个常见的流程。首先，安装所需的依赖项

提示

有关安装集成包的常规说明，请参阅此部分。

npm
Yarn
pnpm

npm install @langchain/openai @langchain/core

yarn add @langchain/openai @langchain/core

pnpm add @langchain/openai @langchain/core

您可以在此处下载 state_of_the_union.txt 文件。

import * as fs from "node:fs";

import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import {
  RunnablePassthrough,
  RunnableSequence,
} from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import type { Document } from "@langchain/core/documents";

const formatDocumentsAsString = (documents: Document[]) => {
  return documents.map((document) => document.pageContent).join("\n\n");
};

// Initialize the LLM to use to answer the question.
const model = new ChatOpenAI({
  model: "gpt-4o",
});
const text = fs.readFileSync("state_of_the_union.txt", "utf8");
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000 });
const docs = await textSplitter.createDocuments([text]);
// Create a vector store from the documents.
const vectorStore = await MemoryVectorStore.fromDocuments(
  docs,
  new OpenAIEmbeddings()
);

// Initialize a retriever wrapper around the vector store
const vectorStoreRetriever = vectorStore.asRetriever();

// Create a system & human prompt for the chat model
const SYSTEM_TEMPLATE = `Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----------------
{context}`;

const prompt = ChatPromptTemplate.fromMessages([
  ["system", SYSTEM_TEMPLATE],
  ["human", "{question}"],
]);

const chain = RunnableSequence.from([
  {
    context: vectorStoreRetriever.pipe(formatDocumentsAsString),
    question: new RunnablePassthrough(),
  },
  prompt,
  model,
  new StringOutputParser(),
]);

const answer = await chain.invoke(
  "What did the president say about Justice Breyer?"
);

console.log({ answer });

/*
  {
    answer: 'The president honored Justice Stephen Breyer by recognizing his dedication to serving the country as an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. He thanked Justice Breyer for his service.'
  }
*/

API 参考

OpenAIEmbeddings 来自 @langchain/openai
ChatOpenAI 来自 @langchain/openai
RecursiveCharacterTextSplitter 来自 @langchain/textsplitters
MemoryVectorStore 来自 langchain/vectorstores/memory
RunnablePassthrough 来自 @langchain/core/runnables
RunnableSequence 来自 @langchain/core/runnables
StringOutputParser 来自 @langchain/core/output_parsers
ChatPromptTemplate 来自 @langchain/core/prompts
Document 来自 @langchain/core/documents

让我们逐步了解这里发生了什么。

我们首先加载一个长文本，并使用文本分割器将其拆分为较小的文档。然后，我们将这些文档（也使用传递的 OpenAIEmbeddings 实例嵌入文档）加载到 HNSWLib，我们的向量存储中，从而创建我们的索引。
虽然我们可以直接查询向量存储，但我们将向量存储转换为检索器，以返回正确格式的检索文档，以便用于问答链。
我们初始化一个检索链，我们将在步骤 4 中稍后调用它。
我们提问！

下一步

您现在已经学会了如何将向量存储转换为检索器。

有关特定检索器的更深入探讨，请参阅各个部分，有关 RAG 的更广泛教程，或此部分以了解如何在任何数据源上创建您自己的自定义检索器。

如何使用向量存储来检索数据

API 参考

下一步

此页对您有帮助吗？

您也可以留下详细的反馈在 GitHub 上.

如何使用向量存储来检索数据

API 参考

下一步​

此页对您有帮助吗？

您也可以留下详细的反馈 在 GitHub 上.

下一步

您也可以留下详细的反馈在 GitHub 上.