构建提取链

前提条件

本指南假定您熟悉以下概念

在本教程中，我们将构建一个链，用于从非结构化文本中提取结构化信息。

信息

本教程仅适用于支持函数/工具调用的模型

设置

安装

要安装 LangChain，请运行

npm
yarn
pnpm

npm i langchain @langchain/core

yarn add langchain @langchain/core

pnpm add langchain @langchain/core

有关更多详细信息，请参阅我们的安装指南。

LangSmith

您使用 LangChain 构建的许多应用程序将包含多个步骤，其中包含多次 LLM 调用。随着这些应用程序变得越来越复杂，能够检查链或代理内部究竟发生了什么变得至关重要。最好的方法是使用 LangSmith。

在上面的链接注册后，请确保设置您的环境变量以开始记录跟踪

export LANGSMITH_TRACING="true"
export LANGSMITH_API_KEY="..."

# Reduce tracing latency if you are not in a serverless environment
# export LANGCHAIN_CALLBACKS_BACKGROUND=true

模式

首先，我们需要描述我们想要从文本中提取哪些信息。

我们将使用 Zod 来定义一个示例模式，该模式提取个人信息。

npm
yarn
pnpm

npm i zod @langchain/core

yarn add zod @langchain/core

pnpm add zod @langchain/core

import { z } from "zod";

const personSchema = z.object({
  name: z.optional(z.string()).describe("The name of the person"),
  hair_color: z
    .optional(z.string())
    .describe("The color of the person's hair if known"),
  height_in_meters: z
    .optional(z.string())
    .describe("Height measured in meters"),
});

定义模式时，有两个最佳实践

记录属性和模式本身：此信息将发送到 LLM，并用于提高信息提取的质量。
不要强迫 LLM 编造信息！上面我们在属性中使用了 .nullish()，允许 LLM 在不知道答案时输出 null 或 undefined。

信息

为了获得最佳性能，请充分记录模式，并确保模型在文本中没有信息可提取时不会被迫返回结果。

提取器

让我们使用上面定义的模式创建一个信息提取器。

import { ChatPromptTemplate } from "@langchain/core/prompts";

// Define a custom prompt to provide instructions and any additional context.
// 1) You can add examples into the prompt template to improve extraction quality
// 2) Introduce additional parameters to take context into account (e.g., include metadata
//    about the document from which the text was extracted.)
const promptTemplate = ChatPromptTemplate.fromMessages([
  [
    "system",
    `You are an expert extraction algorithm.
Only extract relevant information from the text.
If you do not know the value of an attribute asked to extract,
return null for the attribute's value.`,
  ],
  // Please see the how-to about improving performance with
  // reference examples.
  // ["placeholder", "{examples}"],
  ["human", "{text}"],
]);

我们需要使用支持函数/工具调用的模型。

请查看文档，了解可与此 API 一起使用的一些模型的列表。

选择您的聊天模型

安装依赖项

提示

请参阅此部分，了解有关安装集成包的通用说明.

npm
yarn
pnpm

npm i @langchain/groq

yarn add @langchain/groq 

pnpm add @langchain/groq 

添加环境变量

GROQ_API_KEY=your-api-key

实例化模型

import { ChatGroq } from "@langchain/groq";

const llm = new ChatGroq({
  model: "llama-3.3-70b-versatile",
  temperature: 0
});

安装依赖项

提示

请参阅此部分，了解有关安装集成包的通用说明.

npm
yarn
pnpm

npm i @langchain/openai

yarn add @langchain/openai 

pnpm add @langchain/openai 

添加环境变量

OPENAI_API_KEY=your-api-key

实例化模型

import { ChatOpenAI } from "@langchain/openai";

const llm = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0
});

安装依赖项

提示

请参阅此部分，了解有关安装集成包的通用说明.

npm
yarn
pnpm

npm i @langchain/anthropic

yarn add @langchain/anthropic 

pnpm add @langchain/anthropic 

添加环境变量

ANTHROPIC_API_KEY=your-api-key

实例化模型

import { ChatAnthropic } from "@langchain/anthropic";

const llm = new ChatAnthropic({
  model: "claude-3-5-sonnet-20240620",
  temperature: 0
});

安装依赖项

提示

请参阅此部分，了解有关安装集成包的通用说明.

npm
yarn
pnpm

npm i @langchain/community

yarn add @langchain/community 

pnpm add @langchain/community 

添加环境变量

FIREWORKS_API_KEY=your-api-key

实例化模型

import { ChatFireworks } from "@langchain/community/chat_models/fireworks";

const llm = new ChatFireworks({
  model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
  temperature: 0
});

安装依赖项

提示

请参阅此部分，了解有关安装集成包的通用说明.

npm
yarn
pnpm

npm i @langchain/mistralai

yarn add @langchain/mistralai 

pnpm add @langchain/mistralai 

添加环境变量

MISTRAL_API_KEY=your-api-key

实例化模型

import { ChatMistralAI } from "@langchain/mistralai";

const llm = new ChatMistralAI({
  model: "mistral-large-latest",
  temperature: 0
});

安装依赖项

提示

请参阅此部分，了解有关安装集成包的通用说明.

npm
yarn
pnpm

npm i @langchain/google-vertexai

yarn add @langchain/google-vertexai 

pnpm add @langchain/google-vertexai 

添加环境变量

GOOGLE_APPLICATION_CREDENTIALS=credentials.json

实例化模型

import { ChatVertexAI } from "@langchain/google-vertexai";

const llm = new ChatVertexAI({
  model: "gemini-1.5-flash",
  temperature: 0
});

我们通过使用 .withStructuredOutput 方法创建一个新对象来启用结构化输出

const structured_llm = llm.withStructuredOutput(personSchema);

然后我们可以像平常一样调用它

const prompt = await promptTemplate.invoke({
  text: "Alan Smith is 6 feet tall and has blond hair.",
});
await structured_llm.invoke(prompt);

{ name: 'Alan Smith', hair_color: 'blond', height_in_meters: '1.83' }

信息

提取是生成式的 🤯

LLM 是生成模型，因此它们可以做一些非常酷的事情，例如正确提取人的身高（以米为单位），即使它是以英尺为单位提供的！

我们可以在此处查看 LangSmith 跟踪。

即使我们使用变量名 personSchema 定义了我们的模式，Zod 也无法推断出这个名称，因此不会将其传递给模型。为了帮助 LLM 更好地了解您提供的模式代表什么，您还可以为您传递给 withStructuredOutput() 的模式命名

const structured_llm2 = llm.withStructuredOutput(personSchema, {
  name: "person",
});

const prompt2 = await promptTemplate.invoke({
  text: "Alan Smith is 6 feet tall and has blond hair.",
});
await structured_llm2.invoke(prompt2);

{ name: 'Alan Smith', hair_color: 'blond', height_in_meters: '1.83' }

这可以在许多情况下提高性能。

多个实体

在大多数情况下，您应该提取实体列表而不是单个实体。

这可以通过使用 Zod 将模型嵌套在彼此内部来轻松实现。

import { z } from "zod";

const person = z.object({
  name: z.optional(z.string()).describe("The name of the person"),
  hair_color: z
    .optional(z.string())
    .describe("The color of the person's hair if known"),
  height_in_meters: z.number().nullish().describe("Height measured in meters"),
});

const dataSchema = z.object({
  people: z.array(person).describe("Extracted data about people"),
});

信息

这里的提取可能并不完美。请继续查看如何使用参考示例来提高提取质量，并查看指南部分！

const structured_llm3 = llm.withStructuredOutput(dataSchema);
const prompt3 = await promptTemplate.invoke({
  text: "My name is Jeff, my hair is black and i am 6 feet tall. Anna has the same color hair as me.",
});
await structured_llm3.invoke(prompt3);

{
  people: [
    { name: 'Jeff', hair_color: 'black', height_in_meters: 1.83 },
    { name: 'Anna', hair_color: 'black', height_in_meters: null }
  ]
}

提示

当模式适应提取多个实体时，它还允许模型在文本中没有相关信息时通过提供空列表来提取零个实体。

这通常是好事！它允许在实体上指定必需属性，而无需强制模型检测此实体。

我们可以在此处查看 LangSmith 跟踪

下一步

现在您已经了解了 LangChain 提取的基础知识，您就可以继续学习其余的操作指南了

添加示例：了解如何使用参考示例来提高性能。
处理长文本：如果文本不适合 LLM 的上下文窗口，您应该怎么做？
使用解析方法：使用基于提示的方法来提取不支持工具/函数调用的模型。

构建提取链

设置

安装

LangSmith

模式

提取器

选择您的聊天模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

多个实体

下一步

此页面是否对您有帮助？

您也可以留下详细的反馈在 GitHub 上.

设置​

安装​

LangSmith​

模式​

提取器​

选择您的聊天模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

安装依赖项

添加环境变量

实例化模型

多个实体​

下一步​

此页面是否对您有帮助？

您也可以留下详细的反馈 在 GitHub 上.

设置

安装

LangSmith

模式

提取器

多个实体

下一步

您也可以留下详细的反馈在 GitHub 上.