MongoDB Atlas

兼容性

仅在 Node.js 上可用。

您仍然可以通过将 runtime 变量设置为 nodejs 来创建使用 MongoDB 的 Next.js API 路由，如下所示

export const runtime = "nodejs";

您可以在 Next.js 文档此处阅读有关 Edge 运行时的更多信息。

本指南提供了 MongoDB Atlas 向量存储入门的快速概述。有关所有 MongoDBAtlasVectorSearch 功能和配置的详细文档，请访问 API 参考。

概述

集成详情

类	包	PY 支持	最新包
`MongoDBAtlasVectorSearch`	`@langchain/mongodb`	✅

设置

要使用 MongoDB Atlas 向量存储，您需要配置 MongoDB Atlas 集群并安装 @langchain/mongodb 集成包。

初始集群配置

要创建 MongoDB Atlas 集群，请导航至 MongoDB Atlas 网站，如果您还没有帐户，请创建一个帐户。

在提示时创建并命名一个集群，然后在 数据库 下找到它。选择 浏览集合 并创建一个空白集合或从提供的示例数据创建一个集合。

注意： 创建的集群必须是 MongoDB 7.0 或更高版本。

创建索引

配置集群后，您需要在要搜索的集合字段上创建索引。

切换到 Atlas Search 选项卡，然后单击 创建搜索索引。从那里，确保选择 Atlas Vector Search - JSON 编辑器，然后选择适当的数据库和集合，并将以下内容粘贴到文本框中

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "euclidean",
      "type": "vector"
    }
  ]
}

请注意，dimensions 属性应与您使用的嵌入的维度相匹配。例如，Cohere 嵌入具有 1024 个维度，默认情况下 OpenAI 嵌入具有 1536 个维度

注意：默认情况下，向量存储期望索引名称为 default，索引集合字段名称为 embedding，原始文本字段名称为 text。您应该使用与索引名称集合模式匹配的字段名称初始化向量存储，如下所示。

最后，继续构建索引。

嵌入

本指南还将使用 OpenAI 嵌入，这需要您安装 @langchain/openai 集成包。如果您愿意，也可以使用其他受支持的嵌入模型。

安装

安装以下软件包

提示

有关安装集成包的通用说明，请参阅此部分。

npm
yarn
pnpm

npm i @langchain/mongodb mongodb @langchain/openai @langchain/core

yarn add @langchain/mongodb mongodb @langchain/openai @langchain/core

pnpm add @langchain/mongodb mongodb @langchain/openai @langchain/core

凭据

完成上述操作后，从 Mongo 仪表板中的 连接 按钮设置 MONGODB_ATLAS_URI 环境变量。您还需要您的数据库名称和集合名称

process.env.MONGODB_ATLAS_URI = "your-atlas-url";
process.env.MONGODB_ATLAS_COLLECTION_NAME = "your-atlas-db-name";
process.env.MONGODB_ATLAS_DB_NAME = "your-atlas-db-name";

如果您在本指南中使用 OpenAI 嵌入，您还需要设置您的 OpenAI 密钥

process.env.OPENAI_API_KEY = "YOUR_API_KEY";

如果您想获得模型调用的自动跟踪，您还可以通过取消注释下方内容来设置您的 LangSmith API 密钥

// process.env.LANGSMITH_TRACING="true"
// process.env.LANGSMITH_API_KEY="your-api-key"

实例化

如上所示设置集群后，您可以按如下方式初始化向量存储

import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { OpenAIEmbeddings } from "@langchain/openai";
import { MongoClient } from "mongodb";

const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const collection = client
  .db(process.env.MONGODB_ATLAS_DB_NAME)
  .collection(process.env.MONGODB_ATLAS_COLLECTION_NAME);

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {
  collection: collection,
  indexName: "vector_index", // The name of the Atlas search index. Defaults to "default"
  textKey: "text", // The name of the collection field containing the raw content. Defaults to "text"
  embeddingKey: "embedding", // The name of the collection field containing the embedded text. Defaults to "embedding"
});

管理向量存储

向向量存储添加项目

您现在可以将文档添加到您的向量存储

import type { Document } from "@langchain/core/documents";

const document1: Document = {
  pageContent: "The powerhouse of the cell is the mitochondria",
  metadata: { source: "https://example.com" },
};

const document2: Document = {
  pageContent: "Buildings are made out of brick",
  metadata: { source: "https://example.com" },
};

const document3: Document = {
  pageContent: "Mitochondria are made out of lipids",
  metadata: { source: "https://example.com" },
};

const document4: Document = {
  pageContent: "The 2024 Olympics are in Paris",
  metadata: { source: "https://example.com" },
};

const documents = [document1, document2, document3, document4];

await vectorStore.addDocuments(documents, { ids: ["1", "2", "3", "4"] });

[ '1', '2', '3', '4' ]

注意： 添加文档后，它们需要稍作延迟才能变为可查询状态。

添加与现有文档具有相同 id 的文档将更新现有文档。

从向量存储删除项目

await vectorStore.delete({ ids: ["4"] });

查询向量存储

创建向量存储并添加相关文档后，您很可能希望在链或代理运行时查询它。

直接查询

可以按如下方式执行简单的相似性搜索

const similaritySearchResults = await vectorStore.similaritySearch(
  "biology",
  2
);

for (const doc of similaritySearchResults) {
  console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);
}

* The powerhouse of the cell is the mitochondria [{"_id":"1","source":"https://example.com"}]
* Mitochondria are made out of lipids [{"_id":"3","source":"https://example.com"}]

筛选

MongoDB Atlas 支持对其他字段的结果进行预筛选。它们要求您通过更新最初创建的索引来定义计划筛选的元数据字段。这是一个例子

{
  "fields": [
    {
      "numDimensions": 1024,
      "path": "embedding",
      "similarity": "euclidean",
      "type": "vector"
    },
    {
      "path": "source",
      "type": "filter"
    }
  ]
}

上面，fields 中的第一项是向量索引，第二项是您要筛选的元数据属性。属性的名称是 path 键的值。因此，上面的索引将允许我们搜索名为 source 的元数据字段。

然后，在您的代码中，您可以使用 MQL 查询运算符进行筛选。

以下示例说明了这一点

const filter = {
  preFilter: {
    source: {
      $eq: "https://example.com",
    },
  },
};

const filteredResults = await vectorStore.similaritySearch(
  "biology",
  2,
  filter
);

for (const doc of filteredResults) {
  console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);
}

* The powerhouse of the cell is the mitochondria [{"_id":"1","source":"https://example.com"}]
* Mitochondria are made out of lipids [{"_id":"3","source":"https://example.com"}]

返回分数

如果您想执行相似性搜索并接收相应的分数，您可以运行

const similaritySearchWithScoreResults =
  await vectorStore.similaritySearchWithScore("biology", 2, filter);

for (const [doc, score] of similaritySearchWithScoreResults) {
  console.log(
    `* [SIM=${score.toFixed(3)}] ${doc.pageContent} [${JSON.stringify(
      doc.metadata
    )}]`
  );
}

* [SIM=0.374] The powerhouse of the cell is the mitochondria [{"_id":"1","source":"https://example.com"}]
* [SIM=0.370] Mitochondria are made out of lipids [{"_id":"3","source":"https://example.com"}]

通过转换为检索器进行查询

您还可以将向量存储转换为检索器，以便在您的链中更轻松地使用。

const retriever = vectorStore.asRetriever({
  // Optional filter
  filter: filter,
  k: 2,
});
await retriever.invoke("biology");

[
  Document {
    pageContent: 'The powerhouse of the cell is the mitochondria',
    metadata: { _id: '1', source: 'https://example.com' },
    id: undefined
  },
  Document {
    pageContent: 'Mitochondria are made out of lipids',
    metadata: { _id: '3', source: 'https://example.com' },
    id: undefined
  }
]

用于检索增强生成的用法

有关如何将此向量存储用于检索增强生成 (RAG) 的指南，请参阅以下部分

关闭连接

确保在完成后关闭客户端实例，以避免过度消耗资源

await client.close();

API 参考

有关所有 MongoDBAtlasVectorSearch 功能和配置的详细文档，请访问 API 参考。

向量存储概念指南
向量存储操作指南

MongoDB Atlas

概述

集成详情

设置

初始集群配置

创建索引

嵌入

安装

凭据

实例化

管理向量存储

向向量存储添加项目

从向量存储删除项目

查询向量存储

直接查询

筛选

返回分数

通过转换为检索器进行查询

用于检索增强生成的用法

关闭连接

API 参考

此页内容是否对您有帮助？

您也可以留下详细的反馈在 GitHub 上.

概述​

集成详情​

设置​

初始集群配置​

创建索引​

嵌入​

安装​

凭据​

实例化​

管理向量存储​

向向量存储添加项目​

从向量存储删除项目​

查询向量存储​

直接查询​

筛选​

返回分数​

通过转换为检索器进行查询​

用于检索增强生成的用法​

关闭连接​

API 参考​

相关内容​

此页内容是否对您有帮助？

您也可以留下详细的反馈 在 GitHub 上.

概述

集成详情

设置

初始集群配置

创建索引

嵌入

安装

凭据

实例化

管理向量存储

向向量存储添加项目

从向量存储删除项目

查询向量存储

直接查询

筛选

返回分数

通过转换为检索器进行查询

用于检索增强生成的用法

关闭连接

API 参考

相关内容

您也可以留下详细的反馈在 GitHub 上.