MongoDB Atlas

兼容性

仅在 Node.js 上可用。

您仍然可以通过将 runtime 变量设置为 nodejs 来创建使用 MongoDB 的 Next.js API 路由，如下所示

export const runtime = "nodejs";

您可以在 Next.js 文档中阅读有关边缘运行时的更多信息这里。

本指南简要概述了如何使用 MongoDB Atlas 向量存储入门。有关所有 MongoDBAtlasVectorSearch 功能和配置的详细文档，请前往API 参考。

概述

集成详情

类	包	PY 支持	最新包
`MongoDBAtlasVectorSearch`	`@langchain/mongodb`	✅

设置

要使用 MongoDB Atlas 向量存储，您需要配置一个 MongoDB Atlas 集群并安装 @langchain/mongodb 集成包。

初始集群配置

要创建 MongoDB Atlas 集群，请导航到MongoDB Atlas 网站，如果您还没有帐户，请创建一个。

在提示时创建并命名一个集群，然后在 数据库 下找到它。选择 浏览集合 并创建一个空白集合或从提供的示例数据创建一个集合。

注意：创建的集群必须是 MongoDB 7.0 或更高版本。

创建索引

配置集群后，您需要在要搜索的集合字段上创建一个索引。

切换到 Atlas Search 选项卡，然后单击 创建搜索索引。从那里，确保您选择 Atlas Vector Search - JSON 编辑器，然后选择适当的数据库和集合，并将以下内容粘贴到文本框中

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "euclidean",
      "type": "vector"
    }
  ]
}

请注意，dimensions 属性应与您使用的嵌入的维数匹配。例如，Cohere 嵌入具有 1024 个维度，默认情况下 OpenAI 嵌入具有 1536 个维度

注意：默认情况下，向量存储期望索引名称为 default，索引集合字段名称为 embedding，原始文本字段名称为 text。您应该使用与索引名称集合模式匹配的字段名称初始化向量存储，如下所示。

最后，继续构建索引。

嵌入

本指南还将使用OpenAI 嵌入，它要求您安装 @langchain/openai 集成包。如果您愿意，也可以使用其他受支持的嵌入模型。

安装

安装以下包

提示

查看本节以获取有关安装集成包的一般说明。

npm
yarn
pnpm

npm i @langchain/mongodb mongodb @langchain/openai

yarn add @langchain/mongodb mongodb @langchain/openai

pnpm add @langchain/mongodb mongodb @langchain/openai

凭据

完成上述操作后，从 Mongo 仪表板的 连接 按钮设置 MONGODB_ATLAS_URI 环境变量。您还需要您的 DB 名称和集合名称

process.env.MONGODB_ATLAS_URI = "your-atlas-url";
process.env.MONGODB_ATLAS_COLLECTION_NAME = "your-atlas-db-name";
process.env.MONGODB_ATLAS_DB_NAME = "your-atlas-db-name";

如果您在本指南中使用 OpenAI 嵌入，您还需要设置 OpenAI 密钥

process.env.OPENAI_API_KEY = "YOUR_API_KEY";

如果您想自动跟踪您的模型调用，您也可以通过取消以下注释来设置您的LangSmith API 密钥

// process.env.LANGCHAIN_TRACING_V2="true"
// process.env.LANGCHAIN_API_KEY="your-api-key"

实例化

完成上述集群设置后，您可以按如下方式初始化向量存储

import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { OpenAIEmbeddings } from "@langchain/openai";
import { MongoClient } from "mongodb";

const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const collection = client
  .db(process.env.MONGODB_ATLAS_DB_NAME)
  .collection(process.env.MONGODB_ATLAS_COLLECTION_NAME);

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {
  collection: collection,
  indexName: "vector_index", // The name of the Atlas search index. Defaults to "default"
  textKey: "text", // The name of the collection field containing the raw content. Defaults to "text"
  embeddingKey: "embedding", // The name of the collection field containing the embedded text. Defaults to "embedding"
});

管理向量存储

将项目添加到向量存储

您现在可以将文档添加到向量存储

import type { Document } from "@langchain/core/documents";

const document1: Document = {
  pageContent: "The powerhouse of the cell is the mitochondria",
  metadata: { source: "https://example.com" },
};

const document2: Document = {
  pageContent: "Buildings are made out of brick",
  metadata: { source: "https://example.com" },
};

const document3: Document = {
  pageContent: "Mitochondria are made out of lipids",
  metadata: { source: "https://example.com" },
};

const document4: Document = {
  pageContent: "The 2024 Olympics are in Paris",
  metadata: { source: "https://example.com" },
};

const documents = [document1, document2, document3, document4];

await vectorStore.addDocuments(documents, { ids: ["1", "2", "3", "4"] });

[ '1', '2', '3', '4' ]

注意：添加文档后，需要稍等片刻才能进行查询。

添加与现有文档具有相同id的文档将更新现有文档。

从向量存储中删除项目

await vectorStore.delete({ ids: ["4"] });

查询向量存储

创建向量存储并添加相关文档后，您可能希望在运行链或代理时对其进行查询。

直接查询

执行简单的相似性搜索可以按如下方式完成

const similaritySearchResults = await vectorStore.similaritySearch(
  "biology",
  2
);

for (const doc of similaritySearchResults) {
  console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);
}

* The powerhouse of the cell is the mitochondria [{"_id":"1","source":"https://example.com"}]
* Mitochondria are made out of lipids [{"_id":"3","source":"https://example.com"}]

过滤

MongoDB Atlas 支持对其他字段的结果进行预过滤。它们要求您通过更新最初创建的索引来定义要过滤的元数据字段。这是一个示例

{
  "fields": [
    {
      "numDimensions": 1024,
      "path": "embedding",
      "similarity": "euclidean",
      "type": "vector"
    },
    {
      "path": "source",
      "type": "filter"
    }
  ]
}

在上面，fields 中的第一个元素是向量索引，第二个元素是要过滤的元数据属性。属性的名称是path 键的值。因此，上面的索引允许我们搜索名为source 的元数据字段。

然后，在您的代码中，您可以使用 MQL 查询运算符进行过滤。

以下示例说明了这一点

const filter = {
  preFilter: {
    source: {
      $eq: "https://example.com",
    },
  },
};

const filteredResults = await vectorStore.similaritySearch(
  "biology",
  2,
  filter
);

for (const doc of filteredResults) {
  console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);
}

* The powerhouse of the cell is the mitochondria [{"_id":"1","source":"https://example.com"}]
* Mitochondria are made out of lipids [{"_id":"3","source":"https://example.com"}]

返回分数

如果您想执行相似性搜索并接收相应的分数，您可以运行

const similaritySearchWithScoreResults =
  await vectorStore.similaritySearchWithScore("biology", 2, filter);

for (const [doc, score] of similaritySearchWithScoreResults) {
  console.log(
    `* [SIM=${score.toFixed(3)}] ${doc.pageContent} [${JSON.stringify(
      doc.metadata
    )}]`
  );
}

* [SIM=0.374] The powerhouse of the cell is the mitochondria [{"_id":"1","source":"https://example.com"}]
* [SIM=0.370] Mitochondria are made out of lipids [{"_id":"3","source":"https://example.com"}]

通过转换为检索器进行查询

您还可以将向量存储转换为检索器以便在您的链中更轻松地使用。

const retriever = vectorStore.asRetriever({
  // Optional filter
  filter: filter,
  k: 2,
});
await retriever.invoke("biology");

[
  Document {
    pageContent: 'The powerhouse of the cell is the mitochondria',
    metadata: { _id: '1', source: 'https://example.com' },
    id: undefined
  },
  Document {
    pageContent: 'Mitochondria are made out of lipids',
    metadata: { _id: '3', source: 'https://example.com' },
    id: undefined
  }
]

用于检索增强生成的使用

有关如何将此向量存储用于检索增强生成 (RAG) 的指南，请参阅以下部分

关闭连接

确保在完成操作后关闭客户端实例，以避免过度的资源消耗。

await client.close();

API 参考

有关所有MongoDBAtlasVectorSearch 功能和配置的详细文档，请前往 API 参考。

向量存储概念指南
向量存储操作方法指南

MongoDB Atlas

概述

集成详情

设置

初始集群配置

创建索引

嵌入

安装

凭据

实例化

管理向量存储

将项目添加到向量存储

从向量存储中删除项目

查询向量存储

直接查询

过滤

返回分数

通过转换为检索器进行查询

用于检索增强生成的使用

关闭连接

API 参考

此页面是否有帮助？

您也可以留下详细的反馈在 GitHub 上.

概述​

集成详情​

设置​

初始集群配置​

创建索引​

嵌入​

安装​

凭据​

实例化​

管理向量存储​

将项目添加到向量存储​

从向量存储中删除项目​

查询向量存储​

直接查询​

过滤​

返回分数​

通过转换为检索器进行查询​

用于检索增强生成的使用​

关闭连接​

API 参考​

相关​

此页面是否有帮助？

您也可以留下详细的反馈 在 GitHub 上.

概述

集成详情

设置

初始集群配置

创建索引

嵌入

安装

凭据

实例化

管理向量存储

将项目添加到向量存储

从向量存储中删除项目

查询向量存储

直接查询

过滤

返回分数

通过转换为检索器进行查询

用于检索增强生成的使用

关闭连接

API 参考

相关

您也可以留下详细的反馈在 GitHub 上.