跳到主要内容

OpenAI 函数元数据标记器

通常,使用结构化元数据(例如文档的标题、语气或长度)标记摄取的文档可能很有用,以便稍后进行更有针对性的相似性搜索。但是,对于大量文档,手动执行此标记过程可能很繁琐。

MetadataTagger 文档转换器通过根据提供的模式从每个提供的文档中提取元数据来自动化此过程。它在后台使用可配置的 OpenAI Functions 驱动的链,因此如果您传递自定义 LLM 实例,则它必须是支持 functions 的 OpenAI 模型。

注意: 此文档转换器最适用于完整文档,因此最好先对整个文档运行它,然后再进行任何其他拆分或处理!

用法

例如,假设您要索引一组电影评论。您可以按如下方式初始化文档转换器

npm install @langchain/openai @langchain/core
import { z } from "zod";
import { createMetadataTaggerFromZod } from "langchain/document_transformers/openai_functions";
import { ChatOpenAI } from "@langchain/openai";
import { Document } from "@langchain/core/documents";

const zodSchema = z.object({
movie_title: z.string(),
critic: z.string(),
tone: z.enum(["positive", "negative"]),
rating: z
.optional(z.number())
.describe("The number of stars the critic rated the movie"),
});

const metadataTagger = createMetadataTaggerFromZod(zodSchema, {
llm: new ChatOpenAI({ model: "gpt-3.5-turbo" }),
});

const documents = [
new Document({
pageContent:
"Review of The Bee Movie\nBy Roger Ebert\nThis is the greatest movie ever made. 4 out of 5 stars.",
}),
new Document({
pageContent:
"Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.",
metadata: { reliable: false },
}),
];
const taggedDocuments = await metadataTagger.transformDocuments(documents);

console.log(taggedDocuments);

/*
[
Document {
pageContent: 'Review of The Bee Movie\n' +
'By Roger Ebert\n' +
'This is the greatest movie ever made. 4 out of 5 stars.',
metadata: {
movie_title: 'The Bee Movie',
critic: 'Roger Ebert',
tone: 'positive',
rating: 4
}
},
Document {
pageContent: 'Review of The Godfather\n' +
'By Anonymous\n' +
'\n' +
'This movie was super boring. 1 out of 5 stars.',
metadata: {
movie_title: 'The Godfather',
critic: 'Anonymous',
tone: 'negative',
rating: 1,
reliable: false
}
}
]
*/

API 参考

还有一个额外的 createMetadataTagger 方法,它也接受有效的 JSON Schema 对象。

自定义

您可以在第二个 options 参数中将底层标记链的标准 LLMChain 参数传递给它。例如,如果您想要求 LLM 关注输入文档中的特定细节,或者以某种风格提取元数据,则可以传入自定义提示

import { z } from "zod";
import { createMetadataTaggerFromZod } from "langchain/document_transformers/openai_functions";
import { ChatOpenAI } from "@langchain/openai";
import { Document } from "@langchain/core/documents";
import { PromptTemplate } from "@langchain/core/prompts";

const taggingChainTemplate = `Extract the desired information from the following passage.
Anonymous critics are actually Roger Ebert.

Passage:
{input}
`;

const zodSchema = z.object({
movie_title: z.string(),
critic: z.string(),
tone: z.enum(["positive", "negative"]),
rating: z
.optional(z.number())
.describe("The number of stars the critic rated the movie"),
});

const metadataTagger = createMetadataTaggerFromZod(zodSchema, {
llm: new ChatOpenAI({ model: "gpt-3.5-turbo" }),
prompt: PromptTemplate.fromTemplate(taggingChainTemplate),
});

const documents = [
new Document({
pageContent:
"Review of The Bee Movie\nBy Roger Ebert\nThis is the greatest movie ever made. 4 out of 5 stars.",
}),
new Document({
pageContent:
"Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.",
metadata: { reliable: false },
}),
];
const taggedDocuments = await metadataTagger.transformDocuments(documents);

console.log(taggedDocuments);

/*
[
Document {
pageContent: 'Review of The Bee Movie\n' +
'By Roger Ebert\n' +
'This is the greatest movie ever made. 4 out of 5 stars.',
metadata: {
movie_title: 'The Bee Movie',
critic: 'Roger Ebert',
tone: 'positive',
rating: 4
}
},
Document {
pageContent: 'Review of The Godfather\n' +
'By Anonymous\n' +
'\n' +
'This movie was super boring. 1 out of 5 stars.',
metadata: {
movie_title: 'The Godfather',
critic: 'Roger Ebert',
tone: 'negative',
rating: 1,
reliable: false
}
}
]
*/

API 参考


此页面是否对您有帮助?


您也可以留下详细的反馈 在 GitHub 上.