跳至主要内容

如何根据相似度选择示例

先决条件

本指南假设您熟悉以下概念

此对象根据与输入的相似度选择示例。它通过找到与输入的嵌入具有最大余弦相似度的示例来实现这一点。

示例对象的字段将用作格式化传递给 FewShotPromptTemplateexamplePrompt 的参数。因此,每个示例都应包含您正在使用的示例提示所需的所有字段。

npm install @langchain/openai @langchain/community @langchain/core
import { OpenAIEmbeddings } from "@langchain/openai";
import { HNSWLib } from "@langchain/community/vectorstores/hnswlib";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(
"Input: {input}\nOutput: {output}"
);

// Create a SemanticSimilarityExampleSelector that will be used to select the examples.
const exampleSelector = await SemanticSimilarityExampleSelector.fromExamples(
[
{ input: "happy", output: "sad" },
{ input: "tall", output: "short" },
{ input: "energetic", output: "lethargic" },
{ input: "sunny", output: "gloomy" },
{ input: "windy", output: "calm" },
],
new OpenAIEmbeddings(),
HNSWLib,
{ k: 1 }
);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: "Give the antonym of every input",
suffix: "Input: {adjective}\nOutput:",
inputVariables: ["adjective"],
});

// Input is about the weather, so should select eg. the sunny/gloomy example
console.log(await dynamicPrompt.format({ adjective: "rainy" }));
/*
Give the antonym of every input

Input: sunny
Output: gloomy

Input: rainy
Output:
*/

// Input is a measurement, so should select the tall/short example
console.log(await dynamicPrompt.format({ adjective: "large" }));
/*
Give the antonym of every input

Input: tall
Output: short

Input: large
Output:
*/

API 参考

默认情况下,示例对象中的每个字段都会串联在一起、嵌入并存储在向量存储中,以便稍后与用户查询进行相似度搜索。

如果您只想要嵌入特定键(例如,您只想搜索与用户提供的查询具有相似查询的示例),您可以在最终的 options 参数中传递一个 inputKeys 数组。

从现有向量存储加载

您也可以通过将实例直接传递给 SemanticSimilarityExampleSelector 构造函数来使用预先初始化的向量存储,如下所示。您也可以通过 addExample 方法添加更多示例

// Ephemeral, in-memory vector store for demo purposes
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const embeddings = new OpenAIEmbeddings();

const memoryVectorStore = new MemoryVectorStore(embeddings);

const examples = [
{
query: "healthy food",
output: `galbi`,
},
{
query: "healthy food",
output: `schnitzel`,
},
{
query: "foo",
output: `bar`,
},
];

const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStore: memoryVectorStore,
k: 2,
// Only embed the "query" key of each example
inputKeys: ["query"],
});

for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question: {query}",
inputVariables: ["query"],
});

const formattedValue = await dynamicPrompt.format({
query: "What is a healthy food?",
});
console.log(formattedValue);

/*
Answer the user's question, using the below examples as reference:

<example>
<user_input>
healthy
</user_input>
<output>
galbi
</output>
</example>

<example>
<user_input>
healthy
</user_input>
<output>
schnitzel
</output>
</example>

User question: What is a healthy food?
*/

const model = new ChatOpenAI({});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({ query: "What is a healthy food?" });
console.log(result);
/*
AIMessage {
content: 'A healthy food can be galbi or schnitzel.',
additional_kwargs: { function_call: undefined }
}
*/

API 参考

元数据过滤

添加示例时,每个字段都可以在生成的文档中作为元数据使用。如果您想进一步控制搜索空间,您可以在示例中添加额外的字段并在初始化选择器时传递 filter 参数

// Ephemeral, in-memory vector store for demo purposes
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { Document } from "@langchain/core/documents";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const embeddings = new OpenAIEmbeddings();

const memoryVectorStore = new MemoryVectorStore(embeddings);

const examples = [
{
query: "healthy food",
output: `lettuce`,
food_type: "vegetable",
},
{
query: "healthy food",
output: `schnitzel`,
food_type: "veal",
},
{
query: "foo",
output: `bar`,
food_type: "baz",
},
];

const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStore: memoryVectorStore,
k: 2,
// Only embed the "query" key of each example
inputKeys: ["query"],
// Filter type will depend on your specific vector store.
// See the section of the docs for the specific vector store you are using.
filter: (doc: Document) => doc.metadata.food_type === "vegetable",
});

for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question:\n{query}",
inputVariables: ["query"],
});

const model = new ChatOpenAI({});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({
query: "What is exactly one type of healthy food?",
});
console.log(result);
/*
AIMessage {
content: 'One type of healthy food is lettuce.',
additional_kwargs: { function_call: undefined }
}
*/

API 参考

自定义向量存储检索器

您也可以传递向量存储检索器而不是向量存储。一种有用的方式是,如果您想使用除相似度搜索以外的检索,例如最大边缘相关性

/* eslint-disable @typescript-eslint/no-non-null-assertion */

// Requires a vectorstore that supports maximal marginal relevance search
import { Pinecone } from "@pinecone-database/pinecone";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PineconeStore } from "@langchain/pinecone";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const pinecone = new Pinecone();

const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX!);

/**
* Pinecone allows you to partition the records in an index into namespaces.
* Queries and other operations are then limited to one namespace,
* so different requests can search different subsets of your index.
* Read more about namespaces here: https://docs.pinecone.io/guides/indexes/use-namespaces
*
* NOTE: If you have namespace enabled in your Pinecone index, you must provide the namespace when creating the PineconeStore.
*/
const namespace = "pinecone";

const pineconeVectorstore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings(),
{ pineconeIndex, namespace }
);

const pineconeMmrRetriever = pineconeVectorstore.asRetriever({
searchType: "mmr",
k: 2,
});

const examples = [
{
query: "healthy food",
output: `lettuce`,
food_type: "vegetable",
},
{
query: "healthy food",
output: `schnitzel`,
food_type: "veal",
},
{
query: "foo",
output: `bar`,
food_type: "baz",
},
];

const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStoreRetriever: pineconeMmrRetriever,
// Only embed the "query" key of each example
inputKeys: ["query"],
});

for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question:\n{query}",
inputVariables: ["query"],
});

const model = new ChatOpenAI({});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({
query: "What is exactly one type of healthy food?",
});

console.log(result);

/*
AIMessage {
content: 'lettuce.',
additional_kwargs: { function_call: undefined }
}
*/

API 参考

后续步骤

您现在已经了解了如何在示例选择器中使用相似度。

接下来,请查看本指南,了解如何使用基于长度的示例选择器


本页内容对您有帮助吗?


您也可以留下详细的反馈 在 GitHub 上.