跳至主要内容

如何将多模态数据直接传递给模型

先决条件

本指南假设您熟悉以下概念

在这里,我们演示如何将多模态输入直接传递给模型。我们目前期望所有输入都以与 OpenAI 期望 相同的格式传递。对于支持多模态输入的其他模型提供商,我们在类中添加了逻辑以转换为预期格式。

在本示例中,我们将要求模型描述图像。

import * as fs from "node:fs/promises";

import { ChatAnthropic } from "@langchain/anthropic";

const model = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
});

const imageData = await fs.readFile("../../../../examples/hotdog.jpg");

最常用的传递图像的方式是将图像作为字节字符串传递到具有复杂内容类型的支持多模态输入的模型的消息中。以下是一个示例

import { HumanMessage } from "@langchain/core/messages";

const message = new HumanMessage({
content: [
{
type: "text",
text: "what does this image contain?",
},
{
type: "image_url",
image_url: {
url: `data:image/jpeg;base64,${imageData.toString("base64")}`,
},
},
],
});
const response = await model.invoke([message]);
console.log(response.content);
This image contains a hot dog. It shows a frankfurter or sausage encased in a soft, elongated bread bun. The sausage itself appears to be reddish in color, likely a smoked or cured variety. The bun is a golden-brown color, suggesting it has been lightly toasted or grilled. The hot dog is presented against a plain white background, allowing the details of the iconic American fast food item to be clearly visible.

一些模型提供商支持在 "image_url" 类型的 content 块中直接使用图像的 HTTP URL

import { ChatOpenAI } from "@langchain/openai";

const openAIModel = new ChatOpenAI({
model: "gpt-4o",
});

const imageUrl =
"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg";

const message = new HumanMessage({
content: [
{
type: "text",
text: "describe the weather in this image",
},
{
type: "image_url",
image_url: { url: imageUrl },
},
],
});
const response = await openAIModel.invoke([message]);
console.log(response.content);
The weather in the image appears to be pleasant and clear. The sky is mostly blue with a few scattered clouds, indicating good visibility and no immediate signs of rain. The lighting suggests it’s either morning or late afternoon, with sunlight creating a warm and bright atmosphere. There is no indication of strong winds, as the grass and foliage appear calm and undisturbed. Overall, it looks like a beautiful day, possibly spring or summer, ideal for outdoor activities.

我们还可以传递多个图像。

const message = new HumanMessage({
content: [
{
type: "text",
text: "are these two images the same?",
},
{
type: "image_url",
image_url: {
url: imageUrl,
},
},
{
type: "image_url",
image_url: {
url: imageUrl,
},
},
],
});
const response = await openAIModel.invoke([message]);
console.log(response.content);
Yes, the two images are the same.

下一步

您现在已经了解了如何将多模态数据传递给模态。

接下来,您可以查看我们关于 多模态工具调用 的指南。


此页面是否有用?


您也可以留下详细的反馈 在 GitHub 上.