跳至主要内容

如何将多模态数据直接传递给模型

先决条件

本指南假定你熟悉以下概念

这里我们演示了如何将多模态输入直接传递给模型。目前,我们期望所有输入都以与 OpenAI 预期 相同的格式传递。对于支持多模态输入的其他模型提供商,我们在类内部添加了逻辑以转换为预期格式。

在这个示例中,我们将要求模型描述图像。

import * as fs from "node:fs/promises";

import { ChatAnthropic } from "@langchain/anthropic";

const model = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
});

const imageData = await fs.readFile("../../../../examples/hotdog.jpg");

最常用的传递图像的方法是将其作为字节字符串传递到消息中,该消息具有支持多模态输入的模型的复杂内容类型。这是一个示例

import { HumanMessage } from "@langchain/core/messages";

const message = new HumanMessage({
content: [
{
type: "text",
text: "what does this image contain?",
},
{
type: "image_url",
image_url: {
url: `data:image/jpeg;base64,${imageData.toString("base64")}`,
},
},
],
});
const response = await model.invoke([message]);
console.log(response.content);
This image contains a hot dog. It shows a frankfurter or sausage encased in a soft, elongated bread bun. The sausage itself appears to be reddish in color, likely a smoked or cured variety. The bun is a golden-brown color, suggesting it has been lightly toasted or grilled. The hot dog is presented against a plain white background, allowing the details of the iconic American fast food item to be clearly visible.

一些模型提供商支持在类型为 "image_url" 的内容块中直接使用指向图像的 HTTP URL

import { ChatOpenAI } from "@langchain/openai";

const openAIModel = new ChatOpenAI({
model: "gpt-4o",
});

const imageUrl =
"https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg";

const message = new HumanMessage({
content: [
{
type: "text",
text: "describe the weather in this image",
},
{
type: "image_url",
image_url: { url: imageUrl },
},
],
});
const response = await openAIModel.invoke([message]);
console.log(response.content);
The weather in the image appears to be pleasant and clear. The sky is mostly blue with a few scattered clouds, indicating good visibility and no immediate signs of rain. The lighting suggests it’s either morning or late afternoon, with sunlight creating a warm and bright atmosphere. There is no indication of strong winds, as the grass and foliage appear calm and undisturbed. Overall, it looks like a beautiful day, possibly spring or summer, ideal for outdoor activities.

我们也可以传递多个图像。

const message = new HumanMessage({
content: [
{
type: "text",
text: "are these two images the same?",
},
{
type: "image_url",
image_url: {
url: imageUrl,
},
},
{
type: "image_url",
image_url: {
url: imageUrl,
},
},
],
});
const response = await openAIModel.invoke([message]);
console.log(response.content);
Yes, the two images are the same.

下一步

现在你已经了解了如何将多模态数据传递给模型。

接下来,你可以查看我们关于 多模态工具调用 的指南。


此页面是否有帮助?


你也可以在 GitHub 上留下详细的反馈 GitHub.