如何解析 XML 输出

先决条件

本指南假定您熟悉以下概念： - 聊天模型 - 输出解析器 - 提示模板 - 结构化输出 - 将 runnables 链接在一起

来自不同提供商的 LLM 通常具有不同的优势，这取决于它们所训练的特定数据。这也意味着，在生成 JSON 以外的格式的输出时，某些模型可能“更好”且更可靠。

本指南向您展示如何使用 XMLOutputParser 来提示模型生成 XML 输出，然后将该输出解析为可用格式。

注意

请记住，大型语言模型是泄漏的抽象！您必须使用具有足够容量的 LLM 来生成格式良好的 XML。

在以下示例中，我们使用 Anthropic 的 Claude (https://docs.anthropic.com/claude/docs)，它是一个针对 XML 标签优化的模型。

提示

有关安装集成包的通用说明，请参阅此部分。

npm
yarn
pnpm

npm i @langchain/anthropic @langchain/core

yarn add @langchain/anthropic @langchain/core

pnpm add @langchain/anthropic @langchain/core

让我们从向模型发送一个简单的请求开始。

import { ChatAnthropic } from "@langchain/anthropic";

const model = new ChatAnthropic({
  model: "claude-3-sonnet-20240229",
  maxTokens: 512,
  temperature: 0.1,
});

const query = `Generate the shortened filmograph for Tom Hanks.`;

const result = await model.invoke(
  query + ` Please enclose the movies in "movie" tags.`
);

console.log(result.content);

Here is the shortened filmography for Tom Hanks, with movies enclosed in "movie" tags:

<movie>Forrest Gump</movie>
<movie>Saving Private Ryan</movie>
<movie>Cast Away</movie>
<movie>Apollo 13</movie>
<movie>Catch Me If You Can</movie>
<movie>The Green Mile</movie>
<movie>Toy Story</movie>
<movie>Toy Story 2</movie>
<movie>Toy Story 3</movie>
<movie>Toy Story 4</movie>
<movie>Philadelphia</movie>
<movie>Big</movie>
<movie>Sleepless in Seattle</movie>
<movie>You've Got Mail</movie>
<movie>The Terminal</movie>

这实际上效果很好！但是，如果可以将该 XML 解析为更易于使用的格式，那就更好了。我们可以使用 XMLOutputParser 来向提示添加默认格式指令，并将输出的 XML 解析为字典

import { XMLOutputParser } from "@langchain/core/output_parsers";

// We will add these instructions to the prompt below
const parser = new XMLOutputParser();

parser.getFormatInstructions();

"The output should be formatted as a XML file.\n" +
  "1. Output should conform to the tags below. \n" +
  "2. If tag"... 434 more characters

import { ChatPromptTemplate } from "@langchain/core/prompts";

const prompt = ChatPromptTemplate.fromTemplate(
  `{query}\n{format_instructions}`
);
const partialedPrompt = await prompt.partial({
  format_instructions: parser.getFormatInstructions(),
});

const chain = partialedPrompt.pipe(model).pipe(parser);

const output = await chain.invoke({
  query: "Generate the shortened filmograph for Tom Hanks.",
});

console.log(JSON.stringify(output, null, 2));

{
  "filmography": [
    {
      "actor": [
        {
          "name": "Tom Hanks"
        },
        {
          "films": [
            {
              "film": [
                {
                  "title": "Forrest Gump"
                },
                {
                  "year": "1994"
                },
                {
                  "role": "Forrest Gump"
                }
              ]
            },
            {
              "film": [
                {
                  "title": "Saving Private Ryan"
                },
                {
                  "year": "1998"
                },
                {
                  "role": "Captain Miller"
                }
              ]
            },
            {
              "film": [
                {
                  "title": "Cast Away"
                },
                {
                  "year": "2000"
                },
                {
                  "role": "Chuck Noland"
                }
              ]
            },
            {
              "film": [
                {
                  "title": "Catch Me If You Can"
                },
                {
                  "year": "2002"
                },
                {
                  "role": "Carl Hanratty"
                }
              ]
            },
            {
              "film": [
                {
                  "title": "The Terminal"
                },
                {
                  "year": "2004"
                },
                {
                  "role": "Viktor Navorski"
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

您会注意到，上面的输出不再仅仅是在 movie 标签之间。我们还可以添加一些标签来定制输出以满足我们的需求

const parserWithTags = new XMLOutputParser({
  tags: ["movies", "actor", "film", "name", "genre"],
});

// We will add these instructions to the prompt below
parserWithTags.getFormatInstructions();

"The output should be formatted as a XML file.\n" +
  "1. Output should conform to the tags below. \n" +
  "2. If tag"... 460 more characters

您可以并且应该尝试在提示的其他部分中添加您自己的格式提示，以增强或替换默认指令。

这是我们调用它的结果

import { ChatPromptTemplate } from "@langchain/core/prompts";

const promptWithTags = ChatPromptTemplate.fromTemplate(
  `{query}\n{format_instructions}`
);
const partialedPromptWithTags = await promptWithTags.partial({
  format_instructions: parserWithTags.getFormatInstructions(),
});

const chainWithTags = partialedPromptWithTags.pipe(model).pipe(parserWithTags);

const outputWithTags = await chainWithTags.invoke({
  query: "Generate the shortened filmograph for Tom Hanks.",
});

console.log(JSON.stringify(outputWithTags, null, 2));

{
  "movies": [
    {
      "actor": [
        {
          "film": [
            {
              "name": "Forrest Gump"
            },
            {
              "genre": "Drama"
            }
          ]
        },
        {
          "film": [
            {
              "name": "Saving Private Ryan"
            },
            {
              "genre": "War"
            }
          ]
        },
        {
          "film": [
            {
              "name": "Cast Away"
            },
            {
              "genre": "Drama"
            }
          ]
        },
        {
          "film": [
            {
              "name": "Catch Me If You Can"
            },
            {
              "genre": "Biography"
            }
          ]
        },
        {
          "film": [
            {
              "name": "The Terminal"
            },
            {
              "genre": "Comedy-drama"
            }
          ]
        }
      ]
    }
  ]
}

下一步

您现在已经学习了如何提示模型返回 XML。接下来，查看关于获取结构化输出的更广泛指南，以了解其他相关技术。

如何解析 XML 输出

下一步

此页面是否有帮助？

您也可以留下详细的反馈在 GitHub 上.

下一步​

此页面是否有帮助？

您也可以留下详细的反馈 在 GitHub 上.

下一步

您也可以留下详细的反馈在 GitHub 上.