如何从模型中返回结构化数据

让模型返回与特定模式匹配的输出通常很有用。一个常见的用例是从任意文本中提取数据以插入到传统数据库中，或与其他下游系统一起使用。本指南将向您展示一些可以用来实现此目的的不同策略。

先决条件

本指南假设您熟悉以下概念

聊天模型

`.withStructuredOutput()` 方法

模型在后台可以使用多种策略。对于一些最流行的模型提供商，包括Anthropic、Google VertexAI、Mistral 和OpenAI，LangChain 实施了一个称为 .withStructuredOutput 的通用接口，它抽象出了这些策略。

通过调用此方法（并传入JSON 模式或Zod 模式），模型将添加任何必要的模型参数和输出解析器，以获取与请求的模式匹配的结构化输出。如果模型支持多种方法（例如，函数调用与 JSON 模式），您可以通过传入该方法来配置要使用的方法。

让我们看看一些实际例子！我们将使用 Zod 创建一个简单的响应模式。

选择您的聊天模型

OpenAI
Anthropic
MistralAI
Groq
VertexAI

安装依赖项

提示

查看本节以获取有关安装集成包的通用说明.

npm
yarn
pnpm

npm i @langchain/openai

yarn add @langchain/openai 

pnpm add @langchain/openai 

添加环境变量

OPENAI_API_KEY=your-api-key

实例化模型

import { ChatOpenAI } from "@langchain/openai";

const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0
});

安装依赖项

提示

查看本节以获取有关安装集成包的通用说明.

npm
yarn
pnpm

npm i @langchain/anthropic

yarn add @langchain/anthropic 

pnpm add @langchain/anthropic 

添加环境变量

ANTHROPIC_API_KEY=your-api-key

实例化模型

import { ChatAnthropic } from "@langchain/anthropic";

const model = new ChatAnthropic({
  model: "claude-3-5-sonnet-20240620",
  temperature: 0
});

安装依赖项

提示

查看本节以获取有关安装集成包的通用说明.

npm
yarn
pnpm

npm i @langchain/mistralai

yarn add @langchain/mistralai 

pnpm add @langchain/mistralai 

添加环境变量

MISTRAL_API_KEY=your-api-key

实例化模型

import { ChatMistralAI } from "@langchain/mistralai";

const model = new ChatMistralAI({
  model: "mistral-large-latest",
  temperature: 0
});

安装依赖项

提示

查看本节以获取有关安装集成包的通用说明.

npm
yarn
pnpm

npm i @langchain/groq

yarn add @langchain/groq 

pnpm add @langchain/groq 

添加环境变量

GROQ_API_KEY=your-api-key

实例化模型

import { ChatGroq } from "@langchain/groq";

const model = new ChatGroq({
  model: "mixtral-8x7b-32768",
  temperature: 0
});

安装依赖项

提示

查看本节以获取有关安装集成包的通用说明.

npm
yarn
pnpm

npm i @langchain/google-vertexai

yarn add @langchain/google-vertexai 

pnpm add @langchain/google-vertexai 

添加环境变量

GOOGLE_APPLICATION_CREDENTIALS=credentials.json

实例化模型

import { ChatVertexAI } from "@langchain/google-vertexai";

const model = new ChatVertexAI({
  model: "gemini-1.5-flash",
  temperature: 0
});

import { z } from "zod";

const joke = z.object({
  setup: z.string().describe("The setup of the joke"),
  punchline: z.string().describe("The punchline to the joke"),
  rating: z.number().optional().describe("How funny the joke is, from 1 to 10"),
});

const structuredLlm = model.withStructuredOutput(joke);

await structuredLlm.invoke("Tell me a joke about cats");

{
  setup: "Why don't cats play poker in the wild?",
  punchline: "Too many cheetahs.",
  rating: 7
}

一个关键点是，虽然我们将 Zod 模式设置为名为 joke 的变量，但 Zod 无法访问该变量名，因此无法将其传递给模型。虽然这不是必需的，但我们可以为模式传递一个名称，以便为模型提供有关模式代表内容的更多上下文，从而提高性能

const structuredLlm = model.withStructuredOutput(joke, { name: "joke" });

await structuredLlm.invoke("Tell me a joke about cats");

{
  setup: "Why don't cats play poker in the wild?",
  punchline: "Too many cheetahs!",
  rating: 7
}

结果是一个 JSON 对象。

如果您不喜欢使用 Zod，也可以传入 OpenAI 风格的 JSON 模式字典。此对象应包含三个属性

name：要输出的模式的名称。
description：要输出的模式的高级描述。
parameters：您要提取的模式的嵌套详细信息，格式化为JSON 模式字典。

在这种情况下，响应也是一个字典

const structuredLlm = model.withStructuredOutput({
  name: "joke",
  description: "Joke to tell user.",
  parameters: {
    title: "Joke",
    type: "object",
    properties: {
      setup: { type: "string", description: "The setup for the joke" },
      punchline: { type: "string", description: "The joke's punchline" },
    },
    required: ["setup", "punchline"],
  },
});

await structuredLlm.invoke("Tell me a joke about cats", { name: "joke" });

{
  setup: "Why was the cat sitting on the computer?",
  punchline: "Because it wanted to keep an eye on the mouse!"
}

如果您使用的是 JSON 模式，则可以利用其他更复杂的模式描述来创建类似的效果。

您也可以使用直接调用工具的方式，让模型在选项之间进行选择，前提是您选择的模型支持该功能。这需要更多的解析和设置。有关更多详细信息，请参阅此操作指南。

指定输出方法（高级）

对于支持多种数据输出方式的模型，您可以通过以下方式指定首选方式：

const structuredLlm = model.withStructuredOutput(joke, {
  method: "json_mode",
  name: "joke",
});

await structuredLlm.invoke(
  "Tell me a joke about cats, respond in JSON with `setup` and `punchline` keys"
);

{
  setup: "Why don't cats play poker in the jungle?",
  punchline: "Too many cheetahs!"
}

在上面的示例中，我们使用了 OpenAI 的替代 JSON 模式功能以及更具体的提示。

有关您选择的模型的具体信息，请查阅API 参考页面中的相关条目。

（高级）原始输出

LLM 在生成结构化输出方面并不完美，尤其是在模式变得复杂的情况下。您可以通过传递 includeRaw: true 来避免引发异常并自行处理原始输出。这会将输出格式更改为包含原始消息输出和 parsed 值（如果成功）。

const joke = z.object({
  setup: z.string().describe("The setup of the joke"),
  punchline: z.string().describe("The punchline to the joke"),
  rating: z.number().optional().describe("How funny the joke is, from 1 to 10"),
});

const structuredLlm = model.withStructuredOutput(joke, {
  includeRaw: true,
  name: "joke",
});

await structuredLlm.invoke("Tell me a joke about cats");

{
  raw: AIMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "",
      tool_calls: [
        {
          name: "joke",
          args: [Object],
          id: "call_0pEdltlfSXjq20RaBFKSQOeF"
        }
      ],
      invalid_tool_calls: [],
      additional_kwargs: { function_call: undefined, tool_calls: [ [Object] ] },
      response_metadata: {}
    },
    lc_namespace: [ "langchain_core", "messages" ],
    content: "",
    name: undefined,
    additional_kwargs: {
      function_call: undefined,
      tool_calls: [
        {
          id: "call_0pEdltlfSXjq20RaBFKSQOeF",
          type: "function",
          function: [Object]
        }
      ]
    },
    response_metadata: {
      tokenUsage: { completionTokens: 33, promptTokens: 88, totalTokens: 121 },
      finish_reason: "stop"
    },
    tool_calls: [
      {
        name: "joke",
        args: {
          setup: "Why was the cat sitting on the computer?",
          punchline: "Because it wanted to keep an eye on the mouse!",
          rating: 7
        },
        id: "call_0pEdltlfSXjq20RaBFKSQOeF"
      }
    ],
    invalid_tool_calls: [],
    usage_metadata: { input_tokens: 88, output_tokens: 33, total_tokens: 121 }
  },
  parsed: {
    setup: "Why was the cat sitting on the computer?",
    punchline: "Because it wanted to keep an eye on the mouse!",
    rating: 7
  }
}

提示技术

您还可以提示模型以给定格式输出信息。这种方法依赖于设计良好的提示，然后解析模型的输出。对于不支持 .with_structured_output() 或其他内置方法的模型，这是唯一的选择。

使用 `JsonOutputParser`

以下示例使用内置的 JsonOutputParser 来解析聊天模型的输出，该模型被提示匹配给定的 JSON 模式。请注意，我们正在从解析器的方法中将 format_instructions 直接添加到提示中

import { JsonOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";

type Person = {
  name: string;
  height_in_meters: number;
};

type People = {
  people: Person[];
};

const formatInstructions = `Respond only in valid JSON. The JSON object you return should match the following schema:
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}

Where people is an array of objects, each with a name and height_in_meters field.
`;

// Set up a parser
const parser = new JsonOutputParser<People>();

// Prompt
const prompt = await ChatPromptTemplate.fromMessages([
  [
    "system",
    "Answer the user query. Wrap the output in `json` tags\n{format_instructions}",
  ],
  ["human", "{query}"],
]).partial({
  format_instructions: formatInstructions,
});

让我们来看看发送给模型的信息

const query = "Anna is 23 years old and she is 6 feet tall";

console.log((await prompt.format({ query })).toString());

System: Answer the user query. Wrap the output in `json` tags
Respond only in valid JSON. The JSON object you return should match the following schema:
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}

Where people is an array of objects, each with a name and height_in_meters field.

Human: Anna is 23 years old and she is 6 feet tall

现在让我们调用它

const chain = prompt.pipe(model).pipe(parser);

await chain.invoke({ query });

{ people: [ { name: "Anna", height_in_meters: 1.83 } ] }

有关将输出解析器与提示技术一起用于结构化输出的更深入了解，请参阅本指南。

自定义解析

您还可以使用LangChain 表达式语言 (LCEL) 创建自定义提示和解析器，使用普通函数来解析来自模型的输出

import { AIMessage } from "@langchain/core/messages";
import { ChatPromptTemplate } from "@langchain/core/prompts";

type Person = {
  name: string;
  height_in_meters: number;
};

type People = {
  people: Person[];
};

const schema = `{{ people: [{{ name: "string", height_in_meters: "number" }}] }}`;

// Prompt
const prompt = await ChatPromptTemplate.fromMessages([
  [
    "system",
    `Answer the user query. Output your answer as JSON that
matches the given schema: \`\`\`json\n{schema}\n\`\`\`.
Make sure to wrap the answer in \`\`\`json and \`\`\` tags`,
  ],
  ["human", "{query}"],
]).partial({
  schema,
});

/**
 * Custom extractor
 *
 * Extracts JSON content from a string where
 * JSON is embedded between ```json and ``` tags.
 */
const extractJson = (output: AIMessage): Array<People> => {
  const text = output.content as string;
  // Define the regular expression pattern to match JSON blocks
  const pattern = /```json(.*?)```/gs;

  // Find all non-overlapping matches of the pattern in the string
  const matches = text.match(pattern);

  // Process each match, attempting to parse it as JSON
  try {
    return (
      matches?.map((match) => {
        // Remove the markdown code block syntax to isolate the JSON string
        const jsonStr = match.replace(/```json|```/g, "").trim();
        return JSON.parse(jsonStr);
      }) ?? []
    );
  } catch (error) {
    throw new Error(`Failed to parse: ${output}`);
  }
};

以下是发送给模型的提示

const query = "Anna is 23 years old and she is 6 feet tall";

console.log((await prompt.format({ query })).toString());

System: Answer the user query. Output your answer as JSON that
matches the given schema: ```json
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}
```.
Make sure to wrap the answer in ```json and ``` tags
Human: Anna is 23 years old and she is 6 feet tall

以下是调用它时的样子

import { RunnableLambda } from "@langchain/core/runnables";

const chain = prompt
  .pipe(model)
  .pipe(new RunnableLambda({ func: extractJson }));

await chain.invoke({ query });

[
  { people: [ { name: "Anna", height_in_meters: 1.83 } ] }
]

下一步

现在您已经了解了几种使模型输出结构化数据的方法。

要了解更多信息，请查看本节中的其他操作指南，或有关工具调用的概念指南。

如何从模型中返回结构化数据

`.withStructuredOutput()` 方法

选择您的聊天模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

指定输出方法（高级）

（高级）原始输出

提示技术

使用 `JsonOutputParser`

自定义解析

下一步

本页面对您有帮助吗？

您也可以留下详细的反馈在 GitHub 上.

.withStructuredOutput() 方法​

选择您的聊天模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

指定输出方法（高级）​

（高级）原始输出​

提示技术​

使用 JsonOutputParser​

自定义解析​

下一步​

本页面对您有帮助吗？

您也可以留下详细的反馈 在 GitHub 上.

`.withStructuredOutput()` 方法

指定输出方法（高级）

（高级）原始输出

提示技术

使用 `JsonOutputParser`

自定义解析

下一步

您也可以留下详细的反馈在 GitHub 上.