PGVectorStore

兼容性

仅在 Node.js 上可用。

为了在通用 PostgreSQL 数据库中启用向量搜索，LangChain.js 支持使用 pgvector Postgres 扩展。

本指南简要概述了如何开始使用 PGVector 向量存储。有关所有 PGVectorStore 功能和配置的详细文档，请访问 API 参考。

概述

集成详情

类	包	PY 支持	最新包
`PGVectorStore`	`@langchain/community`	✅

设置

要使用 PGVector 向量存储，您需要设置一个启用了 pgvector 扩展的 Postgres 实例。您还需要安装 @langchain/community 集成包，并将 pg 包作为对等依赖项。

本指南还将使用 OpenAI 嵌入，这需要您安装 @langchain/openai 集成包。如果您愿意，也可以使用其他支持的嵌入模型。

我们还将使用 uuid 包以生成所需格式的 ID。

提示

参见本节，了解有关安装集成包的一般说明。

npm
yarn
pnpm

npm i @langchain/community @langchain/openai @langchain/core pg uuid

yarn add @langchain/community @langchain/openai @langchain/core pg uuid

pnpm add @langchain/community @langchain/openai @langchain/core pg uuid

设置实例

根据您设置实例的方式，有许多方法可以连接到 Postgres。以下是使用 pgvector 团队提供的预构建 Docker 映像的本地设置示例。

创建一个包含以下内容的文件，并将其命名为 docker-compose.yml

# Run this command to start the database:
# docker-compose up --build
version: "3"
services:
  db:
    hostname: 127.0.0.1
    image: pgvector/pgvector:pg16
    ports:
      - 5432:5432
    restart: always
    environment:
      - POSTGRES_DB=api
      - POSTGRES_USER=myuser
      - POSTGRES_PASSWORD=ChangeMe
    volumes:
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql

然后在同一个目录中，运行 docker compose up 启动容器。

您可以在官方存储库中找到有关如何设置 pgvector 的更多信息。

凭据

要连接到您 Postgres 实例，您需要相应的凭据。有关受支持选项的完整列表，请参阅 node-postgres 文档。

如果您在本指南中使用 OpenAI 嵌入，则还需要设置您的 OpenAI 密钥

process.env.OPENAI_API_KEY = "YOUR_API_KEY";

如果您希望自动跟踪模型调用，您还可以通过取消以下注释来设置您的 LangSmith API 密钥

// process.env.LANGCHAIN_TRACING_V2="true"
// process.env.LANGCHAIN_API_KEY="your-api-key"

实例化

要实例化向量存储，请调用 .initialize() 静态方法。这将自动检查传递的 config 中给出的 tableName 是否存在。如果不存在，它将使用所需的列创建它。

安全性

用户生成的数据（如用户名）不应作为表名和列名的输入。这可能会导致 SQL 注入！

import {
  PGVectorStore,
  DistanceStrategy,
} from "@langchain/community/vectorstores/pgvector";
import { OpenAIEmbeddings } from "@langchain/openai";
import { PoolConfig } from "pg";

const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
});

// Sample config
const config = {
  postgresConnectionOptions: {
    type: "postgres",
    host: "127.0.0.1",
    port: 5433,
    user: "myuser",
    password: "ChangeMe",
    database: "api",
  } as PoolConfig,
  tableName: "testlangchainjs",
  columns: {
    idColumnName: "id",
    vectorColumnName: "vector",
    contentColumnName: "content",
    metadataColumnName: "metadata",
  },
  // supported distance strategies: cosine (default), innerProduct, or euclidean
  distanceStrategy: "cosine" as DistanceStrategy,
};

const vectorStore = await PGVectorStore.initialize(embeddings, config);

管理向量存储

向向量存储添加项目

import { v4 as uuidv4 } from "uuid";
import type { Document } from "@langchain/core/documents";

const document1: Document = {
  pageContent: "The powerhouse of the cell is the mitochondria",
  metadata: { source: "https://example.com" },
};

const document2: Document = {
  pageContent: "Buildings are made out of brick",
  metadata: { source: "https://example.com" },
};

const document3: Document = {
  pageContent: "Mitochondria are made out of lipids",
  metadata: { source: "https://example.com" },
};

const document4: Document = {
  pageContent: "The 2024 Olympics are in Paris",
  metadata: { source: "https://example.com" },
};

const documents = [document1, document2, document3, document4];

const ids = [uuidv4(), uuidv4(), uuidv4(), uuidv4()];

await vectorStore.addDocuments(documents, { ids: ids });

从向量存储中删除项目

const id4 = ids[ids.length - 1];

await vectorStore.delete({ ids: [id4] });

查询向量存储

创建向量存储并添加相关文档后，您很可能希望在链或代理运行期间查询它。

直接查询

执行简单的相似性搜索可以按照以下步骤进行

const filter = { source: "https://example.com" };

const similaritySearchResults = await vectorStore.similaritySearch(
  "biology",
  2,
  filter
);

for (const doc of similaritySearchResults) {
  console.log(`* ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);
}

* The powerhouse of the cell is the mitochondria [{"source":"https://example.com"}]
* Mitochondria are made out of lipids [{"source":"https://example.com"}]

上面的过滤器语法支持精确匹配，但以下内容也受支持

使用 `in` 运算符

{
  "field": {
    "in": ["value1", "value2"]
  }
}

使用 `arrayContains` 运算符

{
  "field": {
    "arrayContains": ["value1", "value2"]
  }
}

如果您想执行相似性搜索并接收相应的得分，您可以运行

const similaritySearchWithScoreResults =
  await vectorStore.similaritySearchWithScore("biology", 2, filter);

for (const [doc, score] of similaritySearchWithScoreResults) {
  console.log(
    `* [SIM=${score.toFixed(3)}] ${doc.pageContent} [${JSON.stringify(
      doc.metadata
    )}]`
  );
}

* [SIM=0.835] The powerhouse of the cell is the mitochondria [{"source":"https://example.com"}]
* [SIM=0.852] Mitochondria are made out of lipids [{"source":"https://example.com"}]

通过转换为检索器进行查询

您也可以将向量存储转换为一个检索器，以便在您的链中更轻松地使用。

const retriever = vectorStore.asRetriever({
  // Optional filter
  filter: filter,
  k: 2,
});
await retriever.invoke("biology");

[
  Document {
    pageContent: 'The powerhouse of the cell is the mitochondria',
    metadata: { source: 'https://example.com' },
    id: undefined
  },
  Document {
    pageContent: 'Mitochondria are made out of lipids',
    metadata: { source: 'https://example.com' },
    id: undefined
  }
]

用于检索增强型生成的用法

有关如何使用此向量存储进行检索增强型生成 (RAG) 的指南，请参阅以下部分

高级：重用连接

您可以通过创建池，然后通过构造函数直接创建新的 PGVectorStore 实例来重用连接。

请注意，您应该至少调用一次 .initialize() 来设置您的数据库，以便在使用构造函数之前正确设置您的表。

import { OpenAIEmbeddings } from "@langchain/openai";
import { PGVectorStore } from "@langchain/community/vectorstores/pgvector";
import pg from "pg";

// First, follow set-up instructions at
// https://js.langchain.ac.cn/docs/modules/indexes/vector_stores/integrations/pgvector

const reusablePool = new pg.Pool({
  host: "127.0.0.1",
  port: 5433,
  user: "myuser",
  password: "ChangeMe",
  database: "api",
});

const originalConfig = {
  pool: reusablePool,
  tableName: "testlangchainjs",
  collectionName: "sample",
  collectionTableName: "collections",
  columns: {
    idColumnName: "id",
    vectorColumnName: "vector",
    contentColumnName: "content",
    metadataColumnName: "metadata",
  },
};

// Set up the DB.
// Can skip this step if you've already initialized the DB.
// await PGVectorStore.initialize(new OpenAIEmbeddings(), originalConfig);
const pgvectorStore = new PGVectorStore(new OpenAIEmbeddings(), originalConfig);

await pgvectorStore.addDocuments([
  { pageContent: "what's this", metadata: { a: 2 } },
  { pageContent: "Cat drinks milk", metadata: { a: 1 } },
]);

const results = await pgvectorStore.similaritySearch("water", 1);

console.log(results);

/*
  [ Document { pageContent: 'Cat drinks milk', metadata: { a: 1 } } ]
*/

const pgvectorStore2 = new PGVectorStore(new OpenAIEmbeddings(), {
  pool: reusablePool,
  tableName: "testlangchainjs",
  collectionTableName: "collections",
  collectionName: "some_other_collection",
  columns: {
    idColumnName: "id",
    vectorColumnName: "vector",
    contentColumnName: "content",
    metadataColumnName: "metadata",
  },
});

const results2 = await pgvectorStore2.similaritySearch("water", 1);

console.log(results2);

/*
  []
*/

await reusablePool.end();

创建 HNSW 索引

默认情况下，扩展程序执行顺序扫描搜索，召回率为 100%。您可能需要考虑为近似最近邻 (ANN) 搜索创建 HNSW 索引，以加快 similaritySearchVectorWithScore 的执行时间。要在您的向量列上创建 HNSW 索引，请使用 createHnswIndex() 方法。

方法参数包括

dimensions：定义您的向量数据类型中的维度数量，最多 2000。例如，对于 OpenAI 的 text-embedding-ada-002 和 Amazon 的 amazon.titan-embed-text-v1 模型，请使用 1536。
m?：每层的最大连接数（默认值为 16）。使用较小的值可以提高索引构建时间，而较大的值可以加快搜索查询速度。
efConstruction?：用于构建图的动态候选列表的大小（默认值为 64）。较高的值可能会以索引构建时间为代价来提高索引质量。
distanceFunction?：您要使用的距离函数名称，将根据 distanceStrategy 自动选择。

import { OpenAIEmbeddings } from "@langchain/openai";
import {
  DistanceStrategy,
  PGVectorStore,
} from "@langchain/community/vectorstores/pgvector";
import { PoolConfig } from "pg";

// First, follow set-up instructions at
// https://js.langchain.ac.cn/docs/modules/indexes/vector_stores/integrations/pgvector

const hnswConfig = {
  postgresConnectionOptions: {
    type: "postgres",
    host: "127.0.0.1",
    port: 5433,
    user: "myuser",
    password: "ChangeMe",
    database: "api",
  } as PoolConfig,
  tableName: "testlangchainjs",
  columns: {
    idColumnName: "id",
    vectorColumnName: "vector",
    contentColumnName: "content",
    metadataColumnName: "metadata",
  },
  // supported distance strategies: cosine (default), innerProduct, or euclidean
  distanceStrategy: "cosine" as DistanceStrategy,
};

const hnswPgVectorStore = await PGVectorStore.initialize(
  new OpenAIEmbeddings(),
  hnswConfig
);

// create the index
await hnswPgVectorStore.createHnswIndex({
  dimensions: 1536,
  efConstruction: 64,
  m: 16,
});

await hnswPgVectorStore.addDocuments([
  { pageContent: "what's this", metadata: { a: 2, b: ["tag1", "tag2"] } },
  { pageContent: "Cat drinks milk", metadata: { a: 1, b: ["tag2"] } },
]);

const model = new OpenAIEmbeddings();
const query = await model.embedQuery("water");
const hnswResults = await hnswPgVectorStore.similaritySearchVectorWithScore(
  query,
  1
);

console.log(hnswResults);

await pgvectorStore.end();

关闭连接

确保您在完成操作后关闭连接，以避免过度消耗资源

await vectorStore.end();

API 参考

有关所有 PGVectorStore 功能和配置的详细文档，请前往 API 参考。

向量存储概念指南
向量存储操作指南

PGVectorStore

概述

集成详情

设置

设置实例

凭据

实例化

管理向量存储

向向量存储添加项目

从向量存储中删除项目

查询向量存储

直接查询

使用 `in` 运算符

使用 `arrayContains` 运算符

通过转换为检索器进行查询

用于检索增强型生成的用法

高级：重用连接

创建 HNSW 索引

关闭连接

API 参考

此页面对您有帮助吗？

您也可以留下详细的反馈在 GitHub 上.

概述​

集成详情​

设置​

设置实例​

凭据​

实例化​

管理向量存储​

向向量存储添加项目​

从向量存储中删除项目​

查询向量存储​

直接查询​

使用 in 运算符​

使用 arrayContains 运算符​

通过转换为检索器进行查询​

用于检索增强型生成的用法​

高级：重用连接​

创建 HNSW 索引​

关闭连接​

API 参考​

相关​

此页面对您有帮助吗？

您也可以留下详细的反馈 在 GitHub 上.

概述

集成详情

设置

设置实例

凭据

实例化

管理向量存储

向向量存储添加项目

从向量存储中删除项目

查询向量存储

直接查询

使用 `in` 运算符

使用 `arrayContains` 运算符

通过转换为检索器进行查询

用于检索增强型生成的用法

高级：重用连接

创建 HNSW 索引

关闭连接

API 参考

相关

您也可以留下详细的反馈在 GitHub 上.