OpenAI 函数元数据标记器

为更精准的相似度搜索，对已摄入文档进行结构化元数据标签（如标题、语气或长度）通常很有用。但是，对于大量的文档来说，手动进行此标记过程会很繁琐。

MetadataTagger 文档转换器通过根据提供的模式从每个提供的文档中提取元数据来实现此过程的自动化。它在后台使用可配置的 OpenAI 函数支持的链，因此如果您传递自定义 LLM 实例，它必须是支持函数的 OpenAI 模型。

注意：此文档转换器最适合完整文档，因此最好先对整个文档运行它，然后再进行任何其他分割或处理！

用法

例如，假设您想要索引一组电影评论。您可以如下初始化文档转换器

提示

有关安装集成包的一般说明，请参阅本节。

npm
Yarn
pnpm

npm install @langchain/openai @langchain/core

yarn add @langchain/openai @langchain/core

pnpm add @langchain/openai @langchain/core

import { z } from "zod";
import { createMetadataTaggerFromZod } from "langchain/document_transformers/openai_functions";
import { ChatOpenAI } from "@langchain/openai";
import { Document } from "@langchain/core/documents";

const zodSchema = z.object({
  movie_title: z.string(),
  critic: z.string(),
  tone: z.enum(["positive", "negative"]),
  rating: z
    .optional(z.number())
    .describe("The number of stars the critic rated the movie"),
});

const metadataTagger = createMetadataTaggerFromZod(zodSchema, {
  llm: new ChatOpenAI({ model: "gpt-3.5-turbo" }),
});

const documents = [
  new Document({
    pageContent:
      "Review of The Bee Movie\nBy Roger Ebert\nThis is the greatest movie ever made. 4 out of 5 stars.",
  }),
  new Document({
    pageContent:
      "Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.",
    metadata: { reliable: false },
  }),
];
const taggedDocuments = await metadataTagger.transformDocuments(documents);

console.log(taggedDocuments);

/*
  [
    Document {
      pageContent: 'Review of The Bee Movie\n' +
        'By Roger Ebert\n' +
        'This is the greatest movie ever made. 4 out of 5 stars.',
      metadata: {
        movie_title: 'The Bee Movie',
        critic: 'Roger Ebert',
        tone: 'positive',
        rating: 4
      }
    },
    Document {
      pageContent: 'Review of The Godfather\n' +
        'By Anonymous\n' +
        '\n' +
        'This movie was super boring. 1 out of 5 stars.',
      metadata: {
        movie_title: 'The Godfather',
        critic: 'Anonymous',
        tone: 'negative',
        rating: 1,
        reliable: false
      }
    }
  ]
*/

API 参考

createMetadataTaggerFromZod 来自 langchain/document_transformers/openai_functions
ChatOpenAI 来自 @langchain/openai
Document 来自 @langchain/core/documents

还有一个 createMetadataTagger 方法也接受有效的 JSON Schema 对象。

自定义

您可以在第二个选项参数中将标准 LLMChain 参数传递给底层标记链。例如，如果您想让 LLM 专注于输入文档中的特定细节，或者以特定风格提取元数据，您可以传入自定义提示

import { z } from "zod";
import { createMetadataTaggerFromZod } from "langchain/document_transformers/openai_functions";
import { ChatOpenAI } from "@langchain/openai";
import { Document } from "@langchain/core/documents";
import { PromptTemplate } from "@langchain/core/prompts";

const taggingChainTemplate = `Extract the desired information from the following passage.
Anonymous critics are actually Roger Ebert.

Passage:
{input}
`;

const zodSchema = z.object({
  movie_title: z.string(),
  critic: z.string(),
  tone: z.enum(["positive", "negative"]),
  rating: z
    .optional(z.number())
    .describe("The number of stars the critic rated the movie"),
});

const metadataTagger = createMetadataTaggerFromZod(zodSchema, {
  llm: new ChatOpenAI({ model: "gpt-3.5-turbo" }),
  prompt: PromptTemplate.fromTemplate(taggingChainTemplate),
});

const documents = [
  new Document({
    pageContent:
      "Review of The Bee Movie\nBy Roger Ebert\nThis is the greatest movie ever made. 4 out of 5 stars.",
  }),
  new Document({
    pageContent:
      "Review of The Godfather\nBy Anonymous\n\nThis movie was super boring. 1 out of 5 stars.",
    metadata: { reliable: false },
  }),
];
const taggedDocuments = await metadataTagger.transformDocuments(documents);

console.log(taggedDocuments);

/*
  [
    Document {
      pageContent: 'Review of The Bee Movie\n' +
        'By Roger Ebert\n' +
        'This is the greatest movie ever made. 4 out of 5 stars.',
      metadata: {
        movie_title: 'The Bee Movie',
        critic: 'Roger Ebert',
        tone: 'positive',
        rating: 4
      }
    },
    Document {
      pageContent: 'Review of The Godfather\n' +
        'By Anonymous\n' +
        '\n' +
        'This movie was super boring. 1 out of 5 stars.',
      metadata: {
        movie_title: 'The Godfather',
        critic: 'Roger Ebert',
        tone: 'negative',
        rating: 1,
        reliable: false
      }
    }
  ]
*/

API 参考

createMetadataTaggerFromZod 来自 langchain/document_transformers/openai_functions
ChatOpenAI 来自 @langchain/openai
Document 来自 @langchain/core/documents
PromptTemplate 来自 @langchain/core/prompts

OpenAI 函数元数据标记器

用法

API 参考

自定义

API 参考

此页面是否有帮助？

您也可以在 GitHub 上留下详细的反馈 GitHub.

OpenAI 函数元数据标记器

用法​

API 参考

自定义​

API 参考

此页面是否有帮助？

您也可以在 GitHub 上留下详细的反馈 GitHub.

用法

自定义