如何从 LangSmith 数据集中选择示例
兼容性
langsmith
>= 0.1.43
LangSmith 数据集内置支持相似性搜索,使其成为构建和查询少样本示例的绝佳工具。
在本指南中,我们将了解如何使用索引的 LangSmith 数据集作为少样本示例选择器。
设置
在开始之前,请确保你已创建 LangSmith 帐户并设置你的凭据
process.env.LANGSMITH_API_KEY = "your-api-key";
process.env.LANGSMITH_TRACING = "true";
我们需要安装 langsmith
SDK。在本例中,我们还将使用 langchain
和 @langchain/anthropic
- npm
- yarn
- pnpm
npm i langsmith langchain @langchain/anthropic @langchain/core zod zod-to-json-schema
yarn add langsmith langchain @langchain/anthropic @langchain/core zod zod-to-json-schema
pnpm add langsmith langchain @langchain/anthropic @langchain/core zod zod-to-json-schema
现在,我们将克隆一个公共数据集并为该数据集打开索引。我们也可以通过LangSmith UI打开索引。
我们将克隆多元宇宙数学少样本示例数据集。
这将启用对数据集的搜索,并确保我们在更新/添加示例时,它们也会被索引。
创建克隆的第一步是读取包含示例的 JSON 文件,并将其转换为 LangSmith 用于创建示例的预期格式
import { Client as LangSmithClient } from "langsmith";
import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
import fs from "fs/promises";
// Read the example dataset and convert to the format expected by the LangSmith API
// for creating new examples
const examplesJson = JSON.parse(
await fs.readFile("../../data/ls_few_shot_example_dataset.json", "utf-8")
);
let inputs: Record<string, any>[] = [];
let outputs: Record<string, any>[] = [];
let metadata: Record<string, any>[] = [];
examplesJson.forEach((ex) => {
inputs.push(ex.inputs);
outputs.push(ex.outputs);
metadata.push(ex.metadata);
});
// Define our input schema as this is required for indexing
const inputsSchema = zodToJsonSchema(
z.object({
input: z.string(),
system: z.boolean().optional(),
})
);
const lsClient = new LangSmithClient();
await lsClient.deleteDataset({
datasetName: "multiverse-math-examples-for-few-shot-example",
});
const dataset = await lsClient.createDataset(
"multiverse-math-examples-for-few-shot-example",
{
inputsSchema,
}
);
const createdExamples = await lsClient.createExamples({
inputs,
outputs,
metadata,
datasetId: dataset.id,
});
await lsClient.indexDataset({ datasetId: dataset.id });
数据集被索引后,我们可以搜索类似的示例,如下所示
const examples = await lsClient.similarExamples(
{ input: "whats the negation of the negation of the negation of 3" },
dataset.id,
3
);
console.log(examples.length);
3
console.log(examples[0].inputs.input);
evaluate the negation of -100
对于此数据集,输出是整个聊天历史记录
console.log(examples[1].outputs.output);
[
{
id: 'cbe7ed83-86e1-4e46-89de-6646f8b55cef',
type: 'system',
content: 'You are requested to solve math questions in an alternate mathematical universe. The operations have been altered to yield different results than expected. Do not guess the answer or rely on your innate knowledge of math. Use the provided tools to answer the question. While associativity and commutativity apply, distributivity does not. Answer the question using the fewest possible tools. Only include the numeric response without any clarifications.',
additional_kwargs: {},
response_metadata: {}
},
{
id: '04946246-09a8-4465-be95-037efd7dae55',
type: 'human',
content: 'if one gazoink is 4 badoinks, each of which is 6 foos, each of wich is 3 bars - how many bars in 3 gazoinks?',
example: false,
additional_kwargs: {},
response_metadata: {}
},
{
id: 'run-d6f0954e-b21b-4ea8-ad98-0ee64cfc824e-0',
type: 'ai',
content: [ [Object] ],
example: false,
tool_calls: [ [Object] ],
usage_metadata: { input_tokens: 916, total_tokens: 984, output_tokens: 68 },
additional_kwargs: {},
response_metadata: {
id: 'msg_01MBWxgouUBzomwTvXhomGVq',
model: 'claude-3-sonnet-20240229',
usage: [Object],
stop_reason: 'tool_use',
stop_sequence: null
},
invalid_tool_calls: []
},
{
id: '3d4c72c4-f009-48ce-b739-1d3f28ee4803',
name: 'multiply',
type: 'tool',
content: '13.2',
tool_call_id: 'toolu_016RjRHSEyDZRqKhGrb8uvjJ',
additional_kwargs: {},
response_metadata: {}
},
{
id: 'run-26dd7e83-f5fb-4c70-8ba1-271300ffeb25-0',
type: 'ai',
content: [ [Object] ],
example: false,
tool_calls: [ [Object] ],
usage_metadata: { input_tokens: 999, total_tokens: 1070, output_tokens: 71 },
additional_kwargs: {},
response_metadata: {
id: 'msg_01VTFvtCxtR3rN58hCmjt2oH',
model: 'claude-3-sonnet-20240229',
usage: [Object],
stop_reason: 'tool_use',
stop_sequence: null
},
invalid_tool_calls: []
},
{
id: 'ca4e0317-7b3a-4638-933c-1efd98bc4fda',
name: 'multiply',
type: 'tool',
content: '87.12',
tool_call_id: 'toolu_01PqvszxiuXrVJ9bwgTWaH3q',
additional_kwargs: {},
response_metadata: {}
},
{
id: 'run-007794ac-3590-4b9e-b678-008f02e40042-0',
type: 'ai',
content: [ [Object] ],
example: false,
tool_calls: [ [Object] ],
usage_metadata: { input_tokens: 1084, total_tokens: 1155, output_tokens: 71 },
additional_kwargs: {},
response_metadata: {
id: 'msg_017BEkSqmTsmtJaTxAzfRMEh',
model: 'claude-3-sonnet-20240229',
usage: [Object],
stop_reason: 'tool_use',
stop_sequence: null
},
invalid_tool_calls: []
},
{
id: '7f58c121-6f21-4c7b-ba38-aa820e274ff8',
name: 'multiply',
type: 'tool',
content: '287.496',
tool_call_id: 'toolu_01LU3RqRUXZRLRoJ2AZNmPed',
additional_kwargs: {},
response_metadata: {}
},
{
id: 'run-51e35afb-7ec6-4738-93e2-92f80b5c9377-0',
type: 'ai',
content: '287.496',
example: false,
tool_calls: [],
usage_metadata: { input_tokens: 1169, total_tokens: 1176, output_tokens: 7 },
additional_kwargs: {},
response_metadata: {
id: 'msg_01Tx9kSNapSg8aUbWZXiS1NL',
model: 'claude-3-sonnet-20240229',
usage: [Object],
stop_reason: 'end_turn',
stop_sequence: null
},
invalid_tool_calls: []
}
]
搜索将返回其输入与查询输入最相似的示例。我们可以将其用于对模型进行少样本提示。第一步是创建一系列我们希望允许模型调用的数学工具
import { tool } from "@langchain/core/tools";
import { z } from "zod";
const add = tool(
(input) => {
return (input.a + input.b).toString();
},
{
name: "add",
description: "Add two numbers",
schema: z.object({
a: z.number().describe("The first number to add"),
b: z.number().describe("The second number to add"),
}),
}
);
const cos = tool(
(input) => {
return Math.cos(input.angle).toString();
},
{
name: "cos",
description: "Calculate the cosine of an angle (in radians)",
schema: z.object({
angle: z.number().describe("The angle in radians"),
}),
}
);
const divide = tool(
(input) => {
return (input.a / input.b).toString();
},
{
name: "divide",
description: "Divide two numbers",
schema: z.object({
a: z.number().describe("The dividend"),
b: z.number().describe("The divisor"),
}),
}
);
const log = tool(
(input) => {
return Math.log(input.value).toString();
},
{
name: "log",
description: "Calculate the natural logarithm of a number",
schema: z.object({
value: z.number().describe("The number to calculate the logarithm of"),
}),
}
);
const multiply = tool(
(input) => {
return (input.a * input.b).toString();
},
{
name: "multiply",
description: "Multiply two numbers",
schema: z.object({
a: z.number().describe("The first number to multiply"),
b: z.number().describe("The second number to multiply"),
}),
}
);
const negate = tool(
(input) => {
return (-input.a).toString();
},
{
name: "negate",
description: "Negate a number",
schema: z.object({
a: z.number().describe("The number to negate"),
}),
}
);
const pi = tool(
() => {
return Math.PI.toString();
},
{
name: "pi",
description: "Return the value of pi",
schema: z.object({}),
}
);
const power = tool(
(input) => {
return Math.pow(input.base, input.exponent).toString();
},
{
name: "power",
description: "Raise a number to a power",
schema: z.object({
base: z.number().describe("The base number"),
exponent: z.number().describe("The exponent"),
}),
}
);
const sin = tool(
(input) => {
return Math.sin(input.angle).toString();
},
{
name: "sin",
description: "Calculate the sine of an angle (in radians)",
schema: z.object({
angle: z.number().describe("The angle in radians"),
}),
}
);
const subtract = tool(
(input) => {
return (input.a - input.b).toString();
},
{
name: "subtract",
description: "Subtract two numbers",
schema: z.object({
a: z.number().describe("The number to subtract from"),
b: z.number().describe("The number to subtract"),
}),
}
);
import { ChatOpenAI } from "@langchain/openai";
import {
HumanMessage,
SystemMessage,
BaseMessage,
BaseMessageLike,
} from "@langchain/core/messages";
import { RunnableLambda } from "@langchain/core/runnables";
import { Client as LangSmithClient, Example } from "langsmith";
import { coerceMessageLikeToMessage } from "@langchain/core/messages";
const client = new LangSmithClient();
async function similarExamples(
input: Record<string, any>
): Promise<Record<string, any>> {
const examples = await client.similarExamples(input, dataset.id, 5);
return { ...input, examples };
}
function constructPrompt(input: {
examples: Example[];
input: string;
}): BaseMessage[] {
const instructions = "You are great at using mathematical tools.";
let messages: BaseMessage[] = [];
for (const ex of input.examples) {
// Assuming ex.outputs.output is an array of message-like objects
messages = messages.concat(
ex.outputs.output.flatMap((msg: BaseMessageLike) =>
coerceMessageLikeToMessage(msg)
)
);
}
const examples = messages.filter((msg) => msg._getType() !== "system");
examples.forEach((ex) => {
if (ex._getType() === "human") {
ex.name = "example_user";
} else {
ex.name = "example_assistant";
}
});
return [
new SystemMessage(instructions),
...examples,
new HumanMessage(input.input),
];
}
const llm = new ChatOpenAI({
model: "gpt-4o",
temperature: 0,
});
const tools = [
add,
cos,
divide,
log,
multiply,
negate,
pi,
power,
sin,
subtract,
];
const llmWithTools = llm.bindTools(tools);
const exampleSelector = new RunnableLambda({
func: similarExamples,
}).withConfig({ runName: "similarExamples" });
const chain = exampleSelector
.pipe(
new RunnableLambda({
func: constructPrompt,
}).withConfig({
runName: "constructPrompt",
})
)
.pipe(llmWithTools);
const aiMsg = await chain.invoke({
input: "whats the negation of the negation of 3",
system: false,
});
console.log(aiMsg.tool_calls);
[
{
name: 'negate',
args: { a: 3 },
type: 'tool_call',
id: 'call_SX0dmb4AbFu39KkGQDqPXQwa'
}
]
查看 LangSmith 跟踪,我们可以看到相关示例在 similarExamples
步骤中被拉取并在传递给 ChatOpenAI 作为消息:https://smith.langchain.com/public/20e09618-0746-4973-9382-5b36c3f27083/r。