如何裁剪消息
所有模型都有有限的上下文窗口,这意味着它们可以作为输入接收的令牌数量是有限的。如果你有非常长的消息或一个积累了很长消息历史记录的链/智能体,你需要管理传递给模型的消息的长度。
trimMessages
实用程序提供了一些基本策略来裁剪消息列表,使其达到一定的令牌长度。
获取最后 maxTokens
个令牌
要获取消息列表中的最后 maxTokens
个令牌,我们可以设置 strategy: "last"
。注意,对于我们的 tokenCounter
,我们可以传递一个函数(稍后会详细介绍)或一个语言模型(因为语言模型有一个消息令牌计数方法)。当你裁剪消息以使其适合特定模型的上下文窗口时,传递一个模型是有意义的。
import {
AIMessage,
HumanMessage,
SystemMessage,
trimMessages,
} from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";
const messages = [
new SystemMessage("you're a good assistant, you always respond with a joke."),
new HumanMessage("i wonder why it's called langchain"),
new AIMessage(
'Well, I guess they thought "WordRope" and "SentenceString" just didn\'t have the same ring to it!'
),
new HumanMessage("and who is harrison chasing anyways"),
new AIMessage(
"Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
),
new HumanMessage("what do you call a speechless parrot"),
];
const trimmed = await trimMessages(messages, {
maxTokens: 45,
strategy: "last",
tokenCounter: new ChatOpenAI({ modelName: "gpt-4" }),
});
console.log(
trimmed
.map((x) =>
JSON.stringify(
{
role: x._getType(),
content: x.content,
},
null,
2
)
)
.join("\n\n")
);
{
"role": "human",
"content": "and who is harrison chasing anyways"
}
{
"role": "ai",
"content": "Hmmm let me think.\n\nWhy, he's probably chasing after the last cup of coffee in the office!"
}
{
"role": "human",
"content": "what do you call a speechless parrot"
}
如果我们希望始终保留初始系统消息,我们可以指定 includeSystem: true
await trimMessages(messages, {
maxTokens: 45,
strategy: "last",
tokenCounter: new ChatOpenAI({ modelName: "gpt-4" }),
includeSystem: true,
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]
如果我们希望允许拆分消息的内容,我们可以指定 allowPartial: true
await trimMessages(messages, {
maxTokens: 50,
strategy: "last",
tokenCounter: new ChatOpenAI({ modelName: "gpt-4" }),
includeSystem: true,
allowPartial: true,
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]
如果我们需要确保我们的第一条消息(不包括系统消息)始终是特定类型,我们可以指定 startOn
await trimMessages(messages, {
maxTokens: 60,
strategy: "last",
tokenCounter: new ChatOpenAI({ modelName: "gpt-4" }),
includeSystem: true,
startOn: "human",
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'and who is harrison chasing anyways',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'and who is harrison chasing anyways',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]
获取前 maxTokens
个令牌
我们可以执行获取前 maxTokens
个令牌的相反操作,通过指定 strategy: "first"
await trimMessages(messages, {
maxTokens: 45,
strategy: "first",
tokenCounter: new ChatOpenAI({ modelName: "gpt-4" }),
});
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: "i wonder why it's called langchain",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "i wonder why it's called langchain",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]
编写自定义令牌计数器
我们可以编写一个自定义令牌计数器函数,该函数接受一个消息列表并返回一个整数。
import { encodingForModel } from "@langchain/core/utils/tiktoken";
import {
BaseMessage,
HumanMessage,
AIMessage,
ToolMessage,
SystemMessage,
MessageContent,
MessageContentText,
} from "@langchain/core/messages";
async function strTokenCounter(
messageContent: MessageContent
): Promise<number> {
if (typeof messageContent === "string") {
return (await encodingForModel("gpt-4")).encode(messageContent).length;
} else {
if (messageContent.every((x) => x.type === "text" && x.text)) {
return (await encodingForModel("gpt-4")).encode(
(messageContent as MessageContentText[])
.map(({ text }) => text)
.join("")
).length;
}
throw new Error(
`Unsupported message content ${JSON.stringify(messageContent)}`
);
}
}
async function tiktokenCounter(messages: BaseMessage[]): Promise<number> {
let numTokens = 3; // every reply is primed with <|start|>assistant<|message|>
const tokensPerMessage = 3;
const tokensPerName = 1;
for (const msg of messages) {
let role: string;
if (msg instanceof HumanMessage) {
role = "user";
} else if (msg instanceof AIMessage) {
role = "assistant";
} else if (msg instanceof ToolMessage) {
role = "tool";
} else if (msg instanceof SystemMessage) {
role = "system";
} else {
throw new Error(`Unsupported message type ${msg.constructor.name}`);
}
numTokens +=
tokensPerMessage +
(await strTokenCounter(role)) +
(await strTokenCounter(msg.content));
if (msg.name) {
numTokens += tokensPerName + (await strTokenCounter(msg.name));
}
}
return numTokens;
}
await trimMessages(messages, {
maxTokens: 45,
strategy: "last",
tokenCounter: tiktokenCounter,
});
[
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]
链接
trimMessages
可以以命令式(如上所述)或声明式的方式使用,这使得它可以轻松地与链中的其他组件组合在一起。
import { ChatOpenAI } from "@langchain/openai";
import { trimMessages } from "@langchain/core/messages";
const llm = new ChatOpenAI({ model: "gpt-4o" });
// Notice we don't pass in messages. This creates
// a RunnableLambda that takes messages as input
const trimmer = trimMessages({
maxTokens: 45,
strategy: "last",
tokenCounter: llm,
includeSystem: true,
});
const chain = trimmer.pipe(llm);
await chain.invoke(messages);
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Thanks! I do try to keep things light. But for a more serious answer, "LangChain" is likely named to reflect its focus on language processing and the way it connects different components or models together—essentially forming a "chain" of linguistic operations. The "Lang" part emphasizes its focus on language, while "Chain" highlights the interconnected workflows it aims to facilitate.',
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Thanks! I do try to keep things light. But for a more serious answer, "LangChain" is likely named to reflect its focus on language processing and the way it connects different components or models together—essentially forming a "chain" of linguistic operations. The "Lang" part emphasizes its focus on language, while "Chain" highlights the interconnected workflows it aims to facilitate.',
name: undefined,
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {
tokenUsage: { completionTokens: 77, promptTokens: 59, totalTokens: 136 },
finish_reason: 'stop'
},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: { input_tokens: 59, output_tokens: 77, total_tokens: 136 }
}
查看 LangSmith 跟踪,我们可以看到,在将消息传递给模型之前,它们首先会被裁剪。
只看裁剪器,我们可以看到它是一个 Runnable 对象,可以像所有 Runnable 一样被调用。
await trimmer.invoke(messages);
[
SystemMessage {
lc_serializable: true,
lc_kwargs: {
content: "you're a good assistant, you always respond with a joke.",
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: "you're a good assistant, you always respond with a joke.",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
},
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'Hmmm let me think.\n' +
'\n' +
"Why, he's probably chasing after the last cup of coffee in the office!",
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: undefined
},
HumanMessage {
lc_serializable: true,
lc_kwargs: {
content: 'what do you call a speechless parrot',
additional_kwargs: {},
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'what do you call a speechless parrot',
name: undefined,
additional_kwargs: {},
response_metadata: {},
id: undefined
}
]
与 ChatMessageHistory 一起使用
当使用聊天历史记录时,裁剪消息特别有用,因为聊天历史记录可以变得任意长。
import { InMemoryChatMessageHistory } from "@langchain/core/chat_history";
import { RunnableWithMessageHistory } from "@langchain/core/runnables";
import { HumanMessage, trimMessages } from "@langchain/core/messages";
import { ChatOpenAI } from "@langchain/openai";
const chatHistory = new InMemoryChatMessageHistory(messages.slice(0, -1));
const dummyGetSessionHistory = async (sessionId: string) => {
if (sessionId !== "1") {
throw new Error("Session not found");
}
return chatHistory;
};
const llm = new ChatOpenAI({ model: "gpt-4o" });
const trimmer = trimMessages({
maxTokens: 45,
strategy: "last",
tokenCounter: llm,
includeSystem: true,
});
const chain = trimmer.pipe(llm);
const chainWithHistory = new RunnableWithMessageHistory({
runnable: chain,
getMessageHistory: dummyGetSessionHistory,
});
await chainWithHistory.invoke(
[new HumanMessage("what do you call a speechless parrot")],
{ configurable: { sessionId: "1" } }
);
AIMessage {
lc_serializable: true,
lc_kwargs: {
content: 'A "polly-no-want-a-cracker"!',
tool_calls: [],
invalid_tool_calls: [],
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {}
},
lc_namespace: [ 'langchain_core', 'messages' ],
content: 'A "polly-no-want-a-cracker"!',
name: undefined,
additional_kwargs: { function_call: undefined, tool_calls: undefined },
response_metadata: {
tokenUsage: { completionTokens: 11, promptTokens: 57, totalTokens: 68 },
finish_reason: 'stop'
},
id: undefined,
tool_calls: [],
invalid_tool_calls: [],
usage_metadata: { input_tokens: 57, output_tokens: 11, total_tokens: 68 }
}
查看 LangSmith 跟踪,我们可以看到,我们检索所有消息,但在将消息传递给模型之前,它们会被裁剪,只保留系统消息和最后的人类消息。
API 参考
有关所有参数的完整说明,请访问 API 参考。