跳到主要内容

如何修剪消息

先决条件

本指南假定您熟悉以下概念

本指南中的方法还需要 @langchain/core>=0.2.8。请点击此处查看升级指南

所有模型都有有限的上下文窗口,这意味着它们可以接受的令牌数量是有限制的。如果您有非常长的消息或链/代理累积了很长的消息历史记录,您需要管理传递给模型的消息的长度。

trimMessages 实用程序提供了一些基本策略,用于修剪消息列表以达到特定的令牌长度。

获取最后 maxTokens 个令牌

要获取消息列表中最后 maxTokens 个令牌,我们可以设置 strategy: "last"。请注意,对于我们的 tokenCounter,我们可以传入一个函数(稍后会详细介绍)或一个语言模型(因为语言模型具有消息令牌计数方法)。当您修剪消息以适应特定模型的上下文窗口时,传入模型是有意义的

import {
AIMessage,
HumanMessage,
SystemMessage,
trimMessages,
} from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";

const messages = [
new SystemMessage("you're a good assistant, you always respond with a joke."),
new HumanMessage("i wonder why it's called langchain"),
new AIMessage(
'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
),
new HumanMessage("and who is harrison chasing anyways"),
new AIMessage(
"Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
),
new HumanMessage("what do you call a speechless parrot"),
];

const trimmed = await trimMessages(messages, {
maxTokens: 45,
strategy: "last",
tokenCounter: new ChatOpenAI({ modelName: "gpt-4" }),
});

console.log(
trimmed
.map((x) =>
JSON.stringify(
{
role: x._getType(),
content: x.content,
},
null,
2
)
)
.join("\n\n")
);
{
"role": "human",
"content": "and who is harrison chasing anyways"
}

{
"role": "ai",
"content": "Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
}

{
"role": "human",
"content": "what do you call a speechless parrot"
}

如果我们想始终保留初始系统消息,我们可以指定 includeSystem: true

await trimMessages(messages, {
maxTokens: 45,
strategy: "last",
tokenCounter: new ChatOpenAI({ modelName: "gpt-4" }),
includeSystem: true,
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

如果我们想允许拆分消息的内容,我们可以指定 allowPartial: true

await trimMessages(messages, {
maxTokens: 50,
strategy: "last",
tokenCounter: new ChatOpenAI({ modelName: "gpt-4" }),
includeSystem: true,
allowPartial: true,
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

如果我们需要确保我们的第一条消息(不包括系统消息)始终是特定类型,我们可以指定 startOn

await trimMessages(messages, {
maxTokens: 60,
strategy: "last",
tokenCounter: new ChatOpenAI({ modelName: "gpt-4" }),
includeSystem: true,
startOn: "human",
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'and who is harrison chasing anyways',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'and who is harrison chasing anyways',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

获取前 maxTokens 个令牌

我们可以执行翻转操作,通过指定 strategy: "first" 来获取 maxTokens 个令牌

await trimMessages(messages, {
maxTokens: 45,
strategy: "first",
tokenCounter: new ChatOpenAI({ modelName: "gpt-4" }),
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: "i wonder why it's called langchain",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "i wonder why it's called langchain",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

编写自定义令牌计数器

我们可以编写一个自定义令牌计数器函数,该函数接受消息列表并返回一个整数。

import { encodingForModel } from "@langchain/core/utils/tiktoken";
import {
BaseMessage,
HumanMessage,
AIMessage,
ToolMessage,
SystemMessage,
MessageContent,
MessageContentText,
} from "@langchain/core/messages";

async function strTokenCounter(
messageContent: MessageContent
): Promise<number> {
if (typeof messageContent === "string") {
return (await encodingForModel("gpt-4")).encode(messageContent).length;
} else {
if (messageContent.every((x) => x.type === "text" && x.text)) {
return (await encodingForModel("gpt-4")).encode(
(messageContent as MessageContentText[])
.map(({ text }) => text)
.join("")
).length;
}
throw new Error(
`Unsupported message content ${JSON.stringify(messageContent)}`
);
}
}

async function tiktokenCounter(messages: BaseMessage[]): Promise<number> {
let numTokens = 3; // every reply is primed with <|start|>assistant<|message|>
const tokensPerMessage = 3;
const tokensPerName = 1;

for (const msg of messages) {
let role: string;
if (msg instanceof HumanMessage) {
role = "user";
} else if (msg instanceof AIMessage) {
role = "assistant";
} else if (msg instanceof ToolMessage) {
role = "tool";
} else if (msg instanceof SystemMessage) {
role = "system";
} else {
throw new Error(`Unsupported message type ${msg.constructor.name}`);
}

numTokens +=
tokensPerMessage +
(await strTokenCounter(role)) +
(await strTokenCounter(msg.content));

if (msg.name) {
numTokens += tokensPerName + (await strTokenCounter(msg.name));
}
}

return numTokens;
}

await trimMessages(messages, {
maxTokens: 45,
strategy: "last",
tokenCounter: tiktokenCounter,
});
[
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

链式调用

trimMessages 可以命令式地(如上所述)或声明式地使用,使其易于与链中的其他组件组合

import { ChatOpenAI } from "@langchain/openai";
import { trimMessages } from "@langchain/core/messages";

const llm = new ChatOpenAI({ model: "gpt-4o" });

// Notice we don't pass in messages. This creates
// a RunnableLambda that takes messages as input
const trimmer = trimMessages({
maxTokens: 45,
strategy: "last",
tokenCounter: llm,
includeSystem: true,
});

const chain = trimmer.pipe(llm);
await chain.invoke(messages);
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Thanks! I do try to keep things light. But for a more serious answer, "LangChain" is likely named to reflect its focus on language processing and the way it connects different components or models together—essentially forming a "chain" of linguistic operations. The "Lang" part emphasizes its focus on language, while "Chain" highlights the interconnected workflows it aims to facilitate.',
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Thanks! I do try to keep things light. But for a more serious answer, "LangChain" is likely named to reflect its focus on language processing and the way it connects different components or models together—essentially forming a "chain" of linguistic operations. The "Lang" part emphasizes its focus on language, while "Chain" highlights the interconnected workflows it aims to facilitate.',
name: undefined,
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {
tokenUsage: { completionTokens: 77, promptTokens: 59, totalTokens: 136 },
finish_reason: 'stop'
},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: { input_tokens: 59, output_tokens: 77, total_tokens: 136 }
}

查看 LangSmith 跟踪,我们可以看到在消息传递到模型之前,它们首先被修剪。

仅查看修剪器,我们可以看到它是一个 Runnable 对象,可以像所有 Runnables 一样被调用

await trimmer.invoke(messages);
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]

与 ChatMessageHistory 一起使用

使用聊天历史记录时,修剪消息特别有用,因为聊天记录可能会变得任意长

import { InMemoryChatMessageHistory } from "@langchain/core/chat_history";
import { RunnableWithMessageHistory } from "@langchain/core/runnables";
import { HumanMessage, trimMessages } from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";

const chatHistory = new InMemoryChatMessageHistory(messages.slice(0, -1));

const dummyGetSessionHistory = async (sessionId: string) => {
if (sessionId !== "1") {
throw new Error("Session not found");
}
return chatHistory;
};

const llm = new ChatOpenAI({ model: "gpt-4o" });

const trimmer = trimMessages({
maxTokens: 45,
strategy: "last",
tokenCounter: llm,
includeSystem: true,
});

const chain = trimmer.pipe(llm);
const chainWithHistory = new RunnableWithMessageHistory({
runnable: chain,
getMessageHistory: dummyGetSessionHistory,
});
await chainWithHistory.invoke(
[new HumanMessage("what do you call a speechless parrot")],
{ configurable: { sessionId: "1" } }
);
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'A "polly-no-want-a-cracker"!',
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'A "polly-no-want-a-cracker"!',
name: undefined,
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {
tokenUsage: { completionTokens: 11, promptTokens: 57, totalTokens: 68 },
finish_reason: 'stop'
},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: { input_tokens: 57, output_tokens: 11, total_tokens: 68 }
}

查看 LangSmith 跟踪,我们可以看到我们检索了所有消息,但在消息传递到模型之前,它们被修剪为仅包含系统消息和最后一条人类消息。

API 参考

有关所有参数的完整描述,请访问 API 参考


此页面对您有帮助吗?


您也可以留下详细的反馈 在 GitHub 上.