OpenAI 函数元数据标记器
为更精准的相似度搜索,对已摄入文档进行结构化元数据标签(如标题、语气或长度)通常很有用。但是,对于大量的文档来说,手动进行此标记过程会很繁琐。
MetadataTagger
文档转换器通过根据提供的模式从每个提供的文档中提取元数据来实现此过程的自动化。它在后台使用可配置的 OpenAI 函数支持的链,因此如果您传递自定义 LLM 实例,它必须是支持函数的 OpenAI 模型。
注意:此文档转换器最适合完整文档,因此最好先对整个文档运行它,然后再进行任何其他分割或处理!
用法
例如,假设您想要索引一组电影评论。您可以如下初始化文档转换器
提示
有关安装集成包的一般说明,请参阅本节。
- npm
- Yarn
- pnpm
npm install @langchain/openai @langchain/core
yarn add @langchain/openai @langchain/core
pnpm add @langchain/openai @langchain/core
import { z } from "zod";
import { createMetadataTaggerFromZod } from "langchain/document_transformers/openai_functions";
import { ChatOpenAI } from "@langchain/openai";
import { Document } from "@langchain/core/documents";
const zodSchema = z.object({
movie_title: z.string(),
critic: z.string(),
tone: z.enum(["positive", "negative"]),
rating: z
.optional(z.number())
.describe("The number of stars the critic rated the movie"),
});
const metadataTagger = createMetadataTaggerFromZod(zodSchema, {
llm: new ChatOpenAI({ model: "gpt-3.5-turbo" }),
});
const documents = [
new Document({
pageContent:
"Review of The Bee Movie\nBy Roger Ebert\nThis is the greatest movie ever made. 4 out of 5 stars.",
}),
new Document({
pageContent:
"Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.",
metadata: { reliable: false },
}),
];
const taggedDocuments = await metadataTagger.transformDocuments(documents);
console.log(taggedDocuments);
/*
[
Document {
pageContent: 'Review of The Bee Movie\n' +
'By Roger Ebert\n' +
'This is the greatest movie ever made. 4 out of 5 stars.',
metadata: {
movie_title: 'The Bee Movie',
critic: 'Roger Ebert',
tone: 'positive',
rating: 4
}
},
Document {
pageContent: 'Review of The Godfather\n' +
'By Anonymous\n' +
'\n' +
'This movie was super boring. 1 out of 5 stars.',
metadata: {
movie_title: 'The Godfather',
critic: 'Anonymous',
tone: 'negative',
rating: 1,
reliable: false
}
}
]
*/
API 参考
- createMetadataTaggerFromZod 来自
langchain/document_transformers/openai_functions
- ChatOpenAI 来自
@langchain/openai
- Document 来自
@langchain/core/documents
还有一个 createMetadataTagger
方法也接受有效的 JSON Schema 对象。
自定义
您可以在第二个选项参数中将标准 LLMChain 参数传递给底层标记链。例如,如果您想让 LLM 专注于输入文档中的特定细节,或者以特定风格提取元数据,您可以传入自定义提示
import { z } from "zod";
import { createMetadataTaggerFromZod } from "langchain/document_transformers/openai_functions";
import { ChatOpenAI } from "@langchain/openai";
import { Document } from "@langchain/core/documents";
import { PromptTemplate } from "@langchain/core/prompts";
const taggingChainTemplate = `Extract the desired information from the following passage.
Anonymous critics are actually Roger Ebert.
Passage:
{input}
`;
const zodSchema = z.object({
movie_title: z.string(),
critic: z.string(),
tone: z.enum(["positive", "negative"]),
rating: z
.optional(z.number())
.describe("The number of stars the critic rated the movie"),
});
const metadataTagger = createMetadataTaggerFromZod(zodSchema, {
llm: new ChatOpenAI({ model: "gpt-3.5-turbo" }),
prompt: PromptTemplate.fromTemplate(taggingChainTemplate),
});
const documents = [
new Document({
pageContent:
"Review of The Bee Movie\nBy Roger Ebert\nThis is the greatest movie ever made. 4 out of 5 stars.",
}),
new Document({
pageContent:
"Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.",
metadata: { reliable: false },
}),
];
const taggedDocuments = await metadataTagger.transformDocuments(documents);
console.log(taggedDocuments);
/*
[
Document {
pageContent: 'Review of The Bee Movie\n' +
'By Roger Ebert\n' +
'This is the greatest movie ever made. 4 out of 5 stars.',
metadata: {
movie_title: 'The Bee Movie',
critic: 'Roger Ebert',
tone: 'positive',
rating: 4
}
},
Document {
pageContent: 'Review of The Godfather\n' +
'By Anonymous\n' +
'\n' +
'This movie was super boring. 1 out of 5 stars.',
metadata: {
movie_title: 'The Godfather',
critic: 'Roger Ebert',
tone: 'negative',
rating: 1,
reliable: false
}
}
]
*/
API 参考
- createMetadataTaggerFromZod 来自
langchain/document_transformers/openai_functions
- ChatOpenAI 来自
@langchain/openai
- Document 来自
@langchain/core/documents
- PromptTemplate 来自
@langchain/core/prompts