跳至主要内容

如何根据相似度选择示例

先决条件

本指南假设您熟悉以下概念

此对象根据与输入的相似度选择示例。它通过查找与输入具有最大余弦相似度的嵌入的示例来做到这一点。

示例对象的字段将用作参数来格式化传递给FewShotPromptTemplateexamplePrompt。因此,每个示例都应包含您正在使用的示例提示所需的所有字段。

npm install @langchain/openai @langchain/community
import { OpenAIEmbeddings } from "@langchain/openai";
import { HNSWLib } from "@langchain/community/vectorstores/hnswlib";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(
"Input: {input}\nOutput: {output}"
);

// Create a SemanticSimilarityExampleSelector that will be used to select the examples.
const exampleSelector = await SemanticSimilarityExampleSelector.fromExamples(
[
{ input: "happy", output: "sad" },
{ input: "tall", output: "short" },
{ input: "energetic", output: "lethargic" },
{ input: "sunny", output: "gloomy" },
{ input: "windy", output: "calm" },
],
new OpenAIEmbeddings(),
HNSWLib,
{ k: 1 }
);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: "Give the antonym of every input",
suffix: "Input: {adjective}\nOutput:",
inputVariables: ["adjective"],
});

// Input is about the weather, so should select eg. the sunny/gloomy example
console.log(await dynamicPrompt.format({ adjective: "rainy" }));
/*
Give the antonym of every input

Input: sunny
Output: gloomy

Input: rainy
Output:
*/

// Input is a measurement, so should select the tall/short example
console.log(await dynamicPrompt.format({ adjective: "large" }));
/*
Give the antonym of every input

Input: tall
Output: short

Input: large
Output:
*/

API 参考

默认情况下,示例对象中的每个字段都会串联在一起、嵌入并存储在向量存储中,以便以后与用户查询进行相似度搜索。

如果您只想嵌入特定键(例如,您只想搜索与用户提供的查询具有类似查询的示例),您可以在最终的options参数中传递一个inputKeys数组。

从现有向量存储中加载

您还可以通过将实例直接传递到SemanticSimilarityExampleSelector构造函数中,来使用预先初始化的向量存储,如下所示。您还可以通过addExample方法添加更多示例

// Ephemeral, in-memory vector store for demo purposes
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const embeddings = new OpenAIEmbeddings();

const memoryVectorStore = new MemoryVectorStore(embeddings);

const examples = [
{
query: "healthy food",
output: `galbi`,
},
{
query: "healthy food",
output: `schnitzel`,
},
{
query: "foo",
output: `bar`,
},
];

const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStore: memoryVectorStore,
k: 2,
// Only embed the "query" key of each example
inputKeys: ["query"],
});

for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question: {query}",
inputVariables: ["query"],
});

const formattedValue = await dynamicPrompt.format({
query: "What is a healthy food?",
});
console.log(formattedValue);

/*
Answer the user's question, using the below examples as reference:

<example>
<user_input>
healthy
</user_input>
<output>
galbi
</output>
</example>

<example>
<user_input>
healthy
</user_input>
<output>
schnitzel
</output>
</example>

User question: What is a healthy food?
*/

const model = new ChatOpenAI({});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({ query: "What is a healthy food?" });
console.log(result);
/*
AIMessage {
content: 'A healthy food can be galbi or schnitzel.',
additional_kwargs: { function_call: undefined }
}
*/

API 参考

元数据过滤

添加示例时,每个字段都可以在生成的文档中作为元数据使用。如果您希望进一步控制您的搜索空间,您可以在示例中添加额外的字段,并在初始化选择器时传递一个filter参数

// Ephemeral, in-memory vector store for demo purposes
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { Document } from "@langchain/core/documents";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const embeddings = new OpenAIEmbeddings();

const memoryVectorStore = new MemoryVectorStore(embeddings);

const examples = [
{
query: "healthy food",
output: `lettuce`,
food_type: "vegetable",
},
{
query: "healthy food",
output: `schnitzel`,
food_type: "veal",
},
{
query: "foo",
output: `bar`,
food_type: "baz",
},
];

const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStore: memoryVectorStore,
k: 2,
// Only embed the "query" key of each example
inputKeys: ["query"],
// Filter type will depend on your specific vector store.
// See the section of the docs for the specific vector store you are using.
filter: (doc: Document) => doc.metadata.food_type === "vegetable",
});

for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question:\n{query}",
inputVariables: ["query"],
});

const model = new ChatOpenAI({});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({
query: "What is exactly one type of healthy food?",
});
console.log(result);
/*
AIMessage {
content: 'One type of healthy food is lettuce.',
additional_kwargs: { function_call: undefined }
}
*/

API 参考

自定义向量存储检索器

您还可以传递一个向量存储检索器,而不是向量存储。这样做的一种方法是在您想要使用除相似度搜索之外的检索(例如最大边缘相关性)时很有用

/* eslint-disable @typescript-eslint/no-non-null-assertion */

// Requires a vectorstore that supports maximal marginal relevance search
import { Pinecone } from "@pinecone-database/pinecone";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PineconeStore } from "@langchain/pinecone";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const pinecone = new Pinecone();

const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX!);

/**
* Pinecone allows you to partition the records in an index into namespaces.
* Queries and other operations are then limited to one namespace,
* so different requests can search different subsets of your index.
* Read more about namespaces here: https://docs.pinecone.io/guides/indexes/use-namespaces
*
* NOTE: If you have namespace enabled in your Pinecone index, you must provide the namespace when creating the PineconeStore.
*/
const namespace = "pinecone";

const pineconeVectorstore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings(),
{ pineconeIndex, namespace }
);

const pineconeMmrRetriever = pineconeVectorstore.asRetriever({
searchType: "mmr",
k: 2,
});

const examples = [
{
query: "healthy food",
output: `lettuce`,
food_type: "vegetable",
},
{
query: "healthy food",
output: `schnitzel`,
food_type: "veal",
},
{
query: "foo",
output: `bar`,
food_type: "baz",
},
];

const exampleSelector = new SemanticSimilarityExampleSelector({
vectorStoreRetriever: pineconeMmrRetriever,
// Only embed the "query" key of each example
inputKeys: ["query"],
});

for (const example of examples) {
// Format and add an example to the underlying vector store
await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
<user_input>
{query}
</user_input>
<output>
{output}
</output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
// We provide an ExampleSelector instead of examples.
exampleSelector,
examplePrompt,
prefix: `Answer the user's question, using the below examples as reference:`,
suffix: "User question:\n{query}",
inputVariables: ["query"],
});

const model = new ChatOpenAI({});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({
query: "What is exactly one type of healthy food?",
});

console.log(result);

/*
AIMessage {
content: 'lettuce.',
additional_kwargs: { function_call: undefined }
}
*/

API 参考

下一步

您现在已经了解了一些关于在示例选择器中使用相似度的知识。

接下来,查看此指南,了解如何使用基于长度的示例选择器


此页面是否有帮助?


您也可以在 GitHub 上留下详细的反馈 在 GitHub 上.