如何使用参考示例

先决条件

本指南假定您熟悉以下内容

抽取

通常可以通过向 LLM 提供参考示例来提高抽取质量。

提示

虽然本教程重点介绍如何将示例与工具调用模型一起使用，但此技术通常适用，并且也适用于 JSON 或基于提示的技术。

这次我们将使用 OpenAI 的 GPT-4，因为它对 ToolMessages 的支持非常强大

提示

有关安装集成包的通用说明，请参阅此部分。

npm
yarn
pnpm

npm i @langchain/openai @langchain/core zod uuid

yarn add @langchain/openai @langchain/core zod uuid

pnpm add @langchain/openai @langchain/core zod uuid

让我们定义一个提示

import {
  ChatPromptTemplate,
  MessagesPlaceholder,
} from "@langchain/core/prompts";

const SYSTEM_PROMPT_TEMPLATE = `You are an expert extraction algorithm.
Only extract relevant information from the text.
If you do not know the value of an attribute asked to extract, you may omit the attribute's value.`;

// Define a custom prompt to provide instructions and any additional context.
// 1) You can add examples into the prompt template to improve extraction quality
// 2) Introduce additional parameters to take context into account (e.g., include metadata
//    about the document from which the text was extracted.)
const prompt = ChatPromptTemplate.fromMessages([
  ["system", SYSTEM_PROMPT_TEMPLATE],
  // ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
  new MessagesPlaceholder("examples"),
  // ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑
  ["human", "{text}"],
]);

测试模板

import { HumanMessage } from "@langchain/core/messages";

const promptValue = await prompt.invoke({
  text: "this is some text",
  examples: [new HumanMessage("testing 1 2 3")],
});

promptValue.toChatMessages();

[
  SystemMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "You are an expert extraction algorithm.\n" +
        "Only extract relevant information from the text.\n" +
        "If you do n"... 87 more characters,
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "You are an expert extraction algorithm.\n" +
      "Only extract relevant information from the text.\n" +
      "If you do n"... 87 more characters,
    name: undefined,
    additional_kwargs: {}
  },
  HumanMessage {
    lc_serializable: true,
    lc_kwargs: { content: "testing 1 2 3", additional_kwargs: {} },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "testing 1 2 3",
    name: undefined,
    additional_kwargs: {}
  },
  HumanMessage {
    lc_serializable: true,
    lc_kwargs: { content: "this is some text", additional_kwargs: {} },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "this is some text",
    name: undefined,
    additional_kwargs: {}
  }
]

定义模式

让我们重用快速入门中的人员模式。

import { z } from "zod";

const personSchema = z
  .object({
    name: z.optional(z.string()).describe("The name of the person"),
    hair_color: z
      .optional(z.string())
      .describe("The color of the person's hair, if known"),
    height_in_meters: z
      .optional(z.string())
      .describe("Height measured in meters"),
  })
  .describe("Information about a person.");

const peopleSchema = z.object({
  people: z.array(personSchema),
});

定义参考示例

示例可以定义为输入-输出对的列表。

每个示例都包含一个示例 input 文本和一个示例 output，显示应从文本中提取的内容。

信息

下面的示例更高级一些 - 示例的格式需要与使用的 API 匹配（例如，工具调用或 JSON 模式等）。

在这里，格式化的示例将匹配 OpenAI 工具调用 API 期望的格式，因为这是我们正在使用的。

为了向模型提供参考示例，我们将模拟一个虚假的聊天记录，其中包含给定工具的成功用法。由于模型可以选择一次调用多个工具（或多次调用同一工具），因此示例的输出是一个数组

import {
  AIMessage,
  type BaseMessage,
  HumanMessage,
  ToolMessage,
} from "@langchain/core/messages";
import { v4 as uuid } from "uuid";

type OpenAIToolCall = {
  id: string;
  type: "function";
  function: {
    name: string;
    arguments: string;
  };
};

type Example = {
  input: string;
  toolCallOutputs: Record<string, any>[];
};

/**
 * This function converts an example into a list of messages that can be fed into an LLM.
 *
 * This code serves as an adapter that transforms our example into a list of messages
 * that can be processed by a chat model.
 *
 * The list of messages for each example includes:
 *
 * 1) HumanMessage: This contains the content from which information should be extracted.
 * 2) AIMessage: This contains the information extracted by the model.
 * 3) ToolMessage: This provides confirmation to the model that the tool was requested correctly.
 *
 * The inclusion of ToolMessage is necessary because some chat models are highly optimized for agents,
 * making them less suitable for an extraction use case.
 */
function toolExampleToMessages(example: Example): BaseMessage[] {
  const openAIToolCalls: OpenAIToolCall[] = example.toolCallOutputs.map(
    (output) => {
      return {
        id: uuid(),
        type: "function",
        function: {
          // The name of the function right now corresponds
          // to the passed name.
          name: "extract",
          arguments: JSON.stringify(output),
        },
      };
    }
  );
  const messages: BaseMessage[] = [
    new HumanMessage(example.input),
    new AIMessage({
      content: "",
      additional_kwargs: { tool_calls: openAIToolCalls },
    }),
  ];
  const toolMessages = openAIToolCalls.map((toolCall, i) => {
    // Return the mocked successful result for a given tool call.
    return new ToolMessage({
      content: "You have correctly called this tool.",
      tool_call_id: toolCall.id,
    });
  });
  return messages.concat(toolMessages);
}

接下来，让我们定义我们的示例，然后将它们转换为消息格式。

const examples: Example[] = [
  {
    input:
      "The ocean is vast and blue. It's more than 20,000 feet deep. There are many fish in it.",
    toolCallOutputs: [{}],
  },
  {
    input: "Fiona traveled far from France to Spain.",
    toolCallOutputs: [
      {
        name: "Fiona",
      },
    ],
  },
];

const exampleMessages = [];
for (const example of examples) {
  exampleMessages.push(...toolExampleToMessages(example));
}

让我们测试一下提示

const promptValueWithExamples = await prompt.invoke({
  text: "this is some text",
  examples: exampleMessages,
});

promptValueWithExamples.toChatMessages();

[
  SystemMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "You are an expert extraction algorithm.\n" +
        "Only extract relevant information from the text.\n" +
        "If you do n"... 87 more characters,
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "You are an expert extraction algorithm.\n" +
      "Only extract relevant information from the text.\n" +
      "If you do n"... 87 more characters,
    name: undefined,
    additional_kwargs: {}
  },
  HumanMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "The ocean is vast and blue. It's more than 20,000 feet deep. There are many fish in it.",
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "The ocean is vast and blue. It's more than 20,000 feet deep. There are many fish in it.",
    name: undefined,
    additional_kwargs: {}
  },
  AIMessage {
    lc_serializable: true,
    lc_kwargs: { content: "", additional_kwargs: { tool_calls: [ [Object] ] } },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "",
    name: undefined,
    additional_kwargs: {
      tool_calls: [
        {
          id: "8fa4d00d-801f-470e-8737-51ee9dc82259",
          type: "function",
          function: [Object]
        }
      ]
    }
  },
  ToolMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "You have correctly called this tool.",
      tool_call_id: "8fa4d00d-801f-470e-8737-51ee9dc82259",
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "You have correctly called this tool.",
    name: undefined,
    additional_kwargs: {},
    tool_call_id: "8fa4d00d-801f-470e-8737-51ee9dc82259"
  },
  HumanMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "Fiona traveled far from France to Spain.",
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "Fiona traveled far from France to Spain.",
    name: undefined,
    additional_kwargs: {}
  },
  AIMessage {
    lc_serializable: true,
    lc_kwargs: { content: "", additional_kwargs: { tool_calls: [ [Object] ] } },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "",
    name: undefined,
    additional_kwargs: {
      tool_calls: [
        {
          id: "14ad6217-fcbd-47c7-9006-82f612e36c66",
          type: "function",
          function: [Object]
        }
      ]
    }
  },
  ToolMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "You have correctly called this tool.",
      tool_call_id: "14ad6217-fcbd-47c7-9006-82f612e36c66",
      additional_kwargs: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "You have correctly called this tool.",
    name: undefined,
    additional_kwargs: {},
    tool_call_id: "14ad6217-fcbd-47c7-9006-82f612e36c66"
  },
  HumanMessage {
    lc_serializable: true,
    lc_kwargs: { content: "this is some text", additional_kwargs: {} },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "this is some text",
    name: undefined,
    additional_kwargs: {}
  }
]

创建一个抽取器

在这里，我们将使用 gpt-4 创建一个抽取器。

import { ChatOpenAI } from "@langchain/openai";

// We will be using tool calling mode, which
// requires a tool calling capable model.
const llm = new ChatOpenAI({
  // Consider benchmarking with the best model you can to get
  // a sense of the best possible quality.
  model: "gpt-4-0125-preview",
  temperature: 0,
});

// For function/tool calling, we can also supply an name for the schema
// to give the LLM additional context about what it's extracting.
const extractionRunnable = prompt.pipe(
  llm.withStructuredOutput(peopleSchema, { name: "people" })
);

没有示例 😿

请注意，即使我们正在使用 gpt-4，但在一个非常简单的测试用例中，它也是不可靠的！

我们在下面运行 5 次以强调这一点

const text = "The solar system is large, but earth has only 1 moon.";

for (let i = 0; i < 5; i++) {
  const result = await extractionRunnable.invoke({
    text,
    examples: [],
  });
  console.log(result);
}

{
  people: [ { name: "earth", hair_color: "grey", height_in_meters: "1" } ]
}
{ people: [ { name: "earth", hair_color: "moon" } ] }
{ people: [ { name: "earth", hair_color: "moon" } ] }
{ people: [ { name: "earth", hair_color: "1 moon" } ] }
{ people: [] }

有示例 😻

参考示例有助于修复失败！

for (let i = 0; i < 5; i++) {
  const result = await extractionRunnable.invoke({
    text,
    // Example messages from above
    examples: exampleMessages,
  });
  console.log(result);
}

{ people: [] }
{ people: [] }
{ people: [] }
{ people: [] }
{ people: [] }

await extractionRunnable.invoke({
  text: "My name is Hair-ison. My hair is black. I am 3 meters tall.",
  examples: exampleMessages,
});

{
  people: [ { name: "Hair-ison", hair_color: "black", height_in_meters: "3" } ]
}

下一步

您现在已经学习了如何使用少量示例来提高抽取质量。

接下来，查看本节中的其他指南，例如关于如何对长文本执行抽取的一些技巧。

如何使用参考示例

定义模式

定义参考示例

创建一个抽取器

没有示例 😿

有示例 😻

下一步

此页面是否对您有帮助？

您也可以留下详细的反馈在 GitHub 上.

定义模式​

定义参考示例​

创建一个抽取器​

没有示例 😿​

有示例 😻​

下一步​

此页面是否对您有帮助？

您也可以留下详细的反馈 在 GitHub 上.

定义模式

定义参考示例

创建一个抽取器

没有示例 😿

有示例 😻

下一步

您也可以留下详细的反馈在 GitHub 上.