如何将多模态数据直接传递给模型

先决条件

本指南假定您熟悉以下概念

聊天模型

在这里，我们演示如何将多模态输入直接传递给模型。我们目前期望所有输入都以与 OpenAI 期望的格式相同的格式传递。对于其他支持多模态输入的模型提供商，我们在类中添加了逻辑以转换为预期的格式。

在此示例中，我们将要求模型描述图像。

import * as fs from "node:fs/promises";

import { ChatAnthropic } from "@langchain/anthropic";

const model = new ChatAnthropic({
  model: "claude-3-sonnet-20240229",
});

const imageData = await fs.readFile("../../../../examples/hotdog.jpg");

最常用的传递图像的方式是将图像作为字节字符串在具有复杂内容类型的消息中传递，以用于支持多模态输入的模型。这是一个示例

import { HumanMessage } from "@langchain/core/messages";

const message = new HumanMessage({
  content: [
    {
      type: "text",
      text: "what does this image contain?",
    },
    {
      type: "image_url",
      image_url: {
        url: `data:image/jpeg;base64,${imageData.toString("base64")}`,
      },
    },
  ],
});
const response = await model.invoke([message]);
console.log(response.content);

This image contains a hot dog. It shows a frankfurter or sausage encased in a soft, elongated bread bun. The sausage itself appears to be reddish in color, likely a smoked or cured variety. The bun is a golden-brown color, suggesting it has been lightly toasted or grilled. The hot dog is presented against a plain white background, allowing the details of the iconic American fast food item to be clearly visible.

一些模型提供商支持直接在类型为 "image_url" 的内容块中获取图像的 HTTP URL

import { ChatOpenAI } from "@langchain/openai";

const openAIModel = new ChatOpenAI({
  model: "gpt-4o",
});

const imageUrl =
  "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg";

const message = new HumanMessage({
  content: [
    {
      type: "text",
      text: "describe the weather in this image",
    },
    {
      type: "image_url",
      image_url: { url: imageUrl },
    },
  ],
});
const response = await openAIModel.invoke([message]);
console.log(response.content);

The weather in the image appears to be pleasant and clear. The sky is mostly blue with a few scattered clouds, indicating good visibility and no immediate signs of rain. The lighting suggests it’s either morning or late afternoon, with sunlight creating a warm and bright atmosphere. There is no indication of strong winds, as the grass and foliage appear calm and undisturbed. Overall, it looks like a beautiful day, possibly spring or summer, ideal for outdoor activities.

我们还可以传入多张图像。

const message = new HumanMessage({
  content: [
    {
      type: "text",
      text: "are these two images the same?",
    },
    {
      type: "image_url",
      image_url: {
        url: imageUrl,
      },
    },
    {
      type: "image_url",
      image_url: {
        url: imageUrl,
      },
    },
  ],
});
const response = await openAIModel.invoke([message]);
console.log(response.content);

Yes, the two images are the same.

下一步

您现在已经学习了如何将多模态数据传递给模态。

接下来，您可以查看我们关于多模态工具调用的指南。

如何将多模态数据直接传递给模型

下一步

此页是否对您有帮助？

您也可以留下详细的反馈在 GitHub 上.

下一步​

此页是否对您有帮助？

您也可以留下详细的反馈 在 GitHub 上.

下一步

您也可以留下详细的反馈在 GitHub 上.