如何从模型中返回结构化数据
让模型返回与某个特定模式匹配的输出通常很有用。一个常见的用例是从任意文本中提取数据,以插入到传统数据库中或与其他下游系统一起使用。本指南将向您展示一些可用于实现此目的的不同策略。
本指南假定您熟悉以下概念
.withStructuredOutput()
方法
模型在内部可以使用多种策略。对于一些最流行的模型提供商,包括 Anthropic、Google VertexAI、Mistral 和 OpenAI,LangChain 实现了一个通用接口,该接口抽象了这些策略,称为 .withStructuredOutput
。
通过调用此方法(并传入 JSON 模式 或 Zod 模式),模型将添加必要的模型参数 + 输出解析器,以获取匹配请求模式的结构化输出。如果模型支持多种执行此操作的方法(例如,函数调用与 JSON 模式),您可以通过传递到该方法中来配置使用哪种方法。
让我们看看一些实际应用的示例!我们将使用 Zod 创建一个简单的响应模式。
选择您的聊天模型
- OpenAI
- Anthropic
- MistralAI
- Groq
- VertexAI
安装依赖项
请参阅 本节以获取有关安装集成软件包的一般说明.
- npm
- yarn
- pnpm
npm i @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
添加环境变量
OPENAI_API_KEY=your-api-key
实例化模型
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0
});
安装依赖项
请参阅 本节以获取有关安装集成软件包的一般说明.
- npm
- yarn
- pnpm
npm i @langchain/anthropic
yarn add @langchain/anthropic
pnpm add @langchain/anthropic
添加环境变量
ANTHROPIC_API_KEY=your-api-key
实例化模型
import { ChatAnthropic } from "@langchain/anthropic";
const model = new ChatAnthropic({
model: "claude-3-5-sonnet-20240620",
temperature: 0
});
安装依赖项
请参阅 本节以获取有关安装集成软件包的一般说明.
- npm
- yarn
- pnpm
npm i @langchain/mistralai
yarn add @langchain/mistralai
pnpm add @langchain/mistralai
添加环境变量
MISTRAL_API_KEY=your-api-key
实例化模型
import { ChatMistralAI } from "@langchain/mistralai";
const model = new ChatMistralAI({
model: "mistral-large-latest",
temperature: 0
});
安装依赖项
请参阅 本节以获取有关安装集成软件包的一般说明.
- npm
- yarn
- pnpm
npm i @langchain/groq
yarn add @langchain/groq
pnpm add @langchain/groq
添加环境变量
GROQ_API_KEY=your-api-key
实例化模型
import { ChatGroq } from "@langchain/groq";
const model = new ChatGroq({
model: "mixtral-8x7b-32768",
temperature: 0
});
安装依赖项
请参阅 本节以获取有关安装集成软件包的一般说明.
- npm
- yarn
- pnpm
npm i @langchain/google-vertexai
yarn add @langchain/google-vertexai
pnpm add @langchain/google-vertexai
添加环境变量
GOOGLE_APPLICATION_CREDENTIALS=credentials.json
实例化模型
import { ChatVertexAI } from "@langchain/google-vertexai";
const model = new ChatVertexAI({
model: "gemini-1.5-flash",
temperature: 0
});
import { z } from "zod";
const joke = z.object({
setup: z.string().describe("The setup of the joke"),
punchline: z.string().describe("The punchline to the joke"),
rating: z.number().optional().describe("How funny the joke is, from 1 to 10"),
});
const structuredLlm = model.withStructuredOutput(joke);
await structuredLlm.invoke("Tell me a joke about cats");
{
setup: "Why don't cats play poker in the wild?",
punchline: "Too many cheetahs.",
rating: 7
}
一个关键点是,虽然我们将 Zod 模式设置为名为 joke
的变量,但 Zod 无法访问该变量名,因此无法将其传递到模型中。虽然这不是必需的,但我们可以为模式传递一个名称,以便为模型提供更多有关模式代表什么的上下文,从而提高性能
const structuredLlm = model.withStructuredOutput(joke, { name: "joke" });
await structuredLlm.invoke("Tell me a joke about cats");
{
setup: "Why don't cats play poker in the wild?",
punchline: "Too many cheetahs!",
rating: 7
}
结果是一个 JSON 对象。
如果不喜欢使用 Zod,我们也可以传入 OpenAI 风格的 JSON 模式字典。此对象应包含三个属性
name
:要输出的模式的名称。description
:要输出的模式的总体描述。parameters
:您想要提取的模式的嵌套详细信息,格式化为 JSON 模式 字典。
在这种情况下,响应也是一个字典
const structuredLlm = model.withStructuredOutput({
name: "joke",
description: "Joke to tell user.",
parameters: {
title: "Joke",
type: "object",
properties: {
setup: { type: "string", description: "The setup for the joke" },
punchline: { type: "string", description: "The joke's punchline" },
},
required: ["setup", "punchline"],
},
});
await structuredLlm.invoke("Tell me a joke about cats", { name: "joke" });
{
setup: "Why was the cat sitting on the computer?",
punchline: "Because it wanted to keep an eye on the mouse!"
}
如果您使用的是 JSON 模式,则可以利用其他更复杂的模式描述来创建类似的效果。
您也可以使用直接调用工具,以允许模型在选项之间进行选择,前提是您选择的模型支持此功能。 这需要更多解析和设置。 请参阅此操作指南以了解更多详情。
指定输出方法(高级)
对于支持多种输出数据的模型,您可以指定首选输出方式,如下所示
const structuredLlm = model.withStructuredOutput(joke, {
method: "json_mode",
name: "joke",
});
await structuredLlm.invoke(
"Tell me a joke about cats, respond in JSON with `setup` and `punchline` keys"
);
{
setup: "Why don't cats play poker in the jungle?",
punchline: "Too many cheetahs!"
}
在上面的示例中,我们使用 OpenAI 的替代 JSON 模式功能以及更具体的提示。
有关您选择的模型的详细信息,请参阅API 参考页面中的相关条目。
(高级)原始输出
LLM 在生成结构化输出方面并不完美,尤其是在架构变得复杂时。 您可以通过传递 includeRaw: true
来避免引发异常并自行处理原始输出。 这会将输出格式更改为包含原始消息输出和 parsed
值(如果成功)
const joke = z.object({
setup: z.string().describe("The setup of the joke"),
punchline: z.string().describe("The punchline to the joke"),
rating: z.number().optional().describe("How funny the joke is, from 1 to 10"),
});
const structuredLlm = model.withStructuredOutput(joke, {
includeRaw: true,
name: "joke",
});
await structuredLlm.invoke("Tell me a joke about cats");
{
raw: AIMessage {
lc_serializable: true,
lc_kwargs: {
content: "",
tool_calls: [
{
name: "joke",
args: [Object],
id: "call_0pEdltlfSXjq20RaBFKSQOeF"
}
],
invalid_tool_calls: [],
additional_kwargs: { function_call: undefined, tool_calls: [ [Object] ] },
response_metadata: {}
},
lc_namespace: [ "langchain_core", "messages" ],
content: "",
name: undefined,
additional_kwargs: {
function_call: undefined,
tool_calls: [
{
id: "call_0pEdltlfSXjq20RaBFKSQOeF",
type: "function",
function: [Object]
}
]
},
response_metadata: {
tokenUsage: { completionTokens: 33, promptTokens: 88, totalTokens: 121 },
finish_reason: "stop"
},
tool_calls: [
{
name: "joke",
args: {
setup: "Why was the cat sitting on the computer?",
punchline: "Because it wanted to keep an eye on the mouse!",
rating: 7
},
id: "call_0pEdltlfSXjq20RaBFKSQOeF"
}
],
invalid_tool_calls: [],
usage_metadata: { input_tokens: 88, output_tokens: 33, total_tokens: 121 }
},
parsed: {
setup: "Why was the cat sitting on the computer?",
punchline: "Because it wanted to keep an eye on the mouse!",
rating: 7
}
}
提示技巧
您还可以提示模型以特定格式输出信息。 这种方法依赖于设计良好的提示,然后解析模型的输出。 这是对于不支持 .with_structured_output()
或其他内置方法的模型的唯一选项。
使用 JsonOutputParser
以下示例使用内置的JsonOutputParser
来解析提示为匹配给定 JSON 架构的聊天模型的输出。 请注意,我们正在从解析器的方法中直接将 format_instructions
添加到提示中
import { JsonOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";
type Person = {
name: string;
height_in_meters: number;
};
type People = {
people: Person[];
};
const formatInstructions = `Respond only in valid JSON. The JSON object you return should match the following schema:
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}
Where people is an array of objects, each with a name and height_in_meters field.
`;
// Set up a parser
const parser = new JsonOutputParser<People>();
// Prompt
const prompt = await ChatPromptTemplate.fromMessages([
[
"system",
"Answer the user query. Wrap the output in `json` tags\n{format_instructions}",
],
["human", "{query}"],
]).partial({
format_instructions: formatInstructions,
});
让我们看看发送给模型的信息
const query = "Anna is 23 years old and she is 6 feet tall";
console.log((await prompt.format({ query })).toString());
System: Answer the user query. Wrap the output in `json` tags
Respond only in valid JSON. The JSON object you return should match the following schema:
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}
Where people is an array of objects, each with a name and height_in_meters field.
Human: Anna is 23 years old and she is 6 feet tall
现在让我们调用它
const chain = prompt.pipe(model).pipe(parser);
await chain.invoke({ query });
{ people: [ { name: "Anna", height_in_meters: 1.83 } ] }
要深入了解如何使用输出解析器和提示技巧来进行结构化输出,请参阅本指南。
自定义解析
您还可以使用LangChain 表达式语言 (LCEL) 创建自定义提示和解析器,使用普通函数来解析来自模型的输出
import { AIMessage } from "@langchain/core/messages";
import { ChatPromptTemplate } from "@langchain/core/prompts";
type Person = {
name: string;
height_in_meters: number;
};
type People = {
people: Person[];
};
const schema = `{{ people: [{{ name: "string", height_in_meters: "number" }}] }}`;
// Prompt
const prompt = await ChatPromptTemplate.fromMessages([
[
"system",
`Answer the user query. Output your answer as JSON that
matches the given schema: \`\`\`json\n{schema}\n\`\`\`.
Make sure to wrap the answer in \`\`\`json and \`\`\` tags`,
],
["human", "{query}"],
]).partial({
schema,
});
/**
* Custom extractor
*
* Extracts JSON content from a string where
* JSON is embedded between ```json and ``` tags.
*/
const extractJson = (output: AIMessage): Array<People> => {
const text = output.content as string;
// Define the regular expression pattern to match JSON blocks
const pattern = /```json(.*?)```/gs;
// Find all non-overlapping matches of the pattern in the string
const matches = text.match(pattern);
// Process each match, attempting to parse it as JSON
try {
return (
matches?.map((match) => {
// Remove the markdown code block syntax to isolate the JSON string
const jsonStr = match.replace(/```json|```/g, "").trim();
return JSON.parse(jsonStr);
}) ?? []
);
} catch (error) {
throw new Error(`Failed to parse: ${output}`);
}
};
以下是发送给模型的提示
const query = "Anna is 23 years old and she is 6 feet tall";
console.log((await prompt.format({ query })).toString());
System: Answer the user query. Output your answer as JSON that
matches the given schema: ```json
{{ people: [{{ name: "string", height_in_meters: "number" }}] }}
```.
Make sure to wrap the answer in ```json and ``` tags
Human: Anna is 23 years old and she is 6 feet tall
以下是调用它时的样子
import { RunnableLambda } from "@langchain/core/runnables";
const chain = prompt
.pipe(model)
.pipe(new RunnableLambda({ func: extractJson }));
await chain.invoke({ query });
[
{ people: [ { name: "Anna", height_in_meters: 1.83 } ] }
]
下一步
现在您已经了解了一些使模型输出结构化数据的方法。
要了解更多信息,请查看本节中的其他操作指南,或有关工具调用的概念指南。