如何缓存嵌入结果
先决条件
本指南假定您熟悉以下概念
嵌入可以存储或临时缓存,以避免需要重新计算它们。
可以使用 CacheBackedEmbeddings
实例缓存嵌入。
缓存支持的嵌入器是嵌入器的包装器,它将嵌入缓存到键值存储中。
文本被哈希,哈希被用作缓存中的键。
初始化 CacheBackedEmbeddings
的主要支持方式是 fromBytesStore
静态方法。它接受以下参数
underlyingEmbeddings
:要使用的嵌入模型。documentEmbeddingCache
:要用于存储文档嵌入的缓存。namespace
:(可选,默认为 "")要用于文档缓存的命名空间。此命名空间用于避免与其他缓存冲突。例如,您可以将其设置为使用的嵌入模型的名称。
注意:请确保设置命名空间参数,以避免使用不同的嵌入模型嵌入相同文本时发生冲突。
内存中
提示
请参阅 此部分了解有关安装集成包的常规说明。
- npm
- Yarn
- pnpm
npm install @langchain/openai @langchain/community @langchain/core
yarn add @langchain/openai @langchain/community @langchain/core
pnpm add @langchain/openai @langchain/community @langchain/core
这是一个使用内存中缓存的基本测试示例。这种类型的缓存主要用于单元测试或原型设计。如果您需要实际存储嵌入以供长时间使用,请不要使用此缓存
import { OpenAIEmbeddings } from "@langchain/openai";
import { CacheBackedEmbeddings } from "langchain/embeddings/cache_backed";
import { InMemoryStore } from "@langchain/core/stores";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { FaissStore } from "@langchain/community/vectorstores/faiss";
import { TextLoader } from "langchain/document_loaders/fs/text";
const underlyingEmbeddings = new OpenAIEmbeddings();
const inMemoryStore = new InMemoryStore();
const cacheBackedEmbeddings = CacheBackedEmbeddings.fromBytesStore(
underlyingEmbeddings,
inMemoryStore,
{
namespace: underlyingEmbeddings.modelName,
}
);
const loader = new TextLoader("./state_of_the_union.txt");
const rawDocuments = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 0,
});
const documents = await splitter.splitDocuments(rawDocuments);
// No keys logged yet since the cache is empty
for await (const key of inMemoryStore.yieldKeys()) {
console.log(key);
}
let time = Date.now();
const vectorstore = await FaissStore.fromDocuments(
documents,
cacheBackedEmbeddings
);
console.log(`Initial creation time: ${Date.now() - time}ms`);
/*
Initial creation time: 1905ms
*/
// The second time is much faster since the embeddings for the input docs have already been added to the cache
time = Date.now();
const vectorstore2 = await FaissStore.fromDocuments(
documents,
cacheBackedEmbeddings
);
console.log(`Cached creation time: ${Date.now() - time}ms`);
/*
Cached creation time: 8ms
*/
// Many keys logged with hashed values
const keys = [];
for await (const key of inMemoryStore.yieldKeys()) {
keys.push(key);
}
console.log(keys.slice(0, 5));
/*
[
'text-embedding-ada-002ea9b59e760e64bec6ee9097b5a06b0d91cb3ab64',
'text-embedding-ada-0023b424f5ed1271a6f5601add17c1b58b7c992772e',
'text-embedding-ada-002fec5d021611e1527297c5e8f485876ea82dcb111',
'text-embedding-ada-00262f72e0c2d711c6b861714ee624b28af639fdb13',
'text-embedding-ada-00262d58882330038a4e6e25ea69a938f4391541874'
]
*/
API 参考
- OpenAIEmbeddings 来自
@langchain/openai
- CacheBackedEmbeddings 来自
langchain/embeddings/cache_backed
- InMemoryStore 来自
@langchain/core/stores
- RecursiveCharacterTextSplitter 来自
@langchain/textsplitters
- FaissStore 来自
@langchain/community/vectorstores/faiss
- TextLoader 来自
langchain/document_loaders/fs/text
Redis
这是一个使用 Redis 缓存的示例。
您首先需要安装 ioredis
作为对等依赖项,并传入一个已初始化的客户端
- npm
- Yarn
- pnpm
npm install ioredis
yarn add ioredis
pnpm add ioredis
import { Redis } from "ioredis";
import { OpenAIEmbeddings } from "@langchain/openai";
import { CacheBackedEmbeddings } from "langchain/embeddings/cache_backed";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { FaissStore } from "@langchain/community/vectorstores/faiss";
import { RedisByteStore } from "@langchain/community/storage/ioredis";
import { TextLoader } from "langchain/document_loaders/fs/text";
const underlyingEmbeddings = new OpenAIEmbeddings();
// Requires a Redis instance running at https://127.0.0.1:6379.
// See https://github.com/redis/ioredis for full config options.
const redisClient = new Redis();
const redisStore = new RedisByteStore({
client: redisClient,
});
const cacheBackedEmbeddings = CacheBackedEmbeddings.fromBytesStore(
underlyingEmbeddings,
redisStore,
{
namespace: underlyingEmbeddings.modelName,
}
);
const loader = new TextLoader("./state_of_the_union.txt");
const rawDocuments = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 0,
});
const documents = await splitter.splitDocuments(rawDocuments);
let time = Date.now();
const vectorstore = await FaissStore.fromDocuments(
documents,
cacheBackedEmbeddings
);
console.log(`Initial creation time: ${Date.now() - time}ms`);
/*
Initial creation time: 1808ms
*/
// The second time is much faster since the embeddings for the input docs have already been added to the cache
time = Date.now();
const vectorstore2 = await FaissStore.fromDocuments(
documents,
cacheBackedEmbeddings
);
console.log(`Cached creation time: ${Date.now() - time}ms`);
/*
Cached creation time: 33ms
*/
// Many keys logged with hashed values
const keys = [];
for await (const key of redisStore.yieldKeys()) {
keys.push(key);
}
console.log(keys.slice(0, 5));
/*
[
'text-embedding-ada-002fa9ac80e1bf226b7b4dfc03ea743289a65a727b2',
'text-embedding-ada-0027dbf9c4b36e12fe1768300f145f4640342daaf22',
'text-embedding-ada-002ea9b59e760e64bec6ee9097b5a06b0d91cb3ab64',
'text-embedding-ada-002fec5d021611e1527297c5e8f485876ea82dcb111',
'text-embedding-ada-002c00f818c345da13fed9f2697b4b689338143c8c7'
]
*/
API 参考
- OpenAIEmbeddings 来自
@langchain/openai
- CacheBackedEmbeddings 来自
langchain/embeddings/cache_backed
- RecursiveCharacterTextSplitter 来自
@langchain/textsplitters
- FaissStore 来自
@langchain/community/vectorstores/faiss
- RedisByteStore 来自
@langchain/community/storage/ioredis
- TextLoader 来自
langchain/document_loaders/fs/text
下一步
您现在已经了解了如何使用缓存来避免重新计算嵌入。
接下来,查看 检索增强生成完整教程。