如何根据相似度选择示例

先决条件

本指南假设您熟悉以下概念

此对象根据与输入的相似度选择示例。它通过查找与输入具有最大余弦相似度的嵌入的示例来做到这一点。

示例对象的字段将用作参数来格式化传递给FewShotPromptTemplate的examplePrompt。因此，每个示例都应包含您正在使用的示例提示所需的所有字段。

提示

npm
Yarn
pnpm

npm install @langchain/openai @langchain/community

yarn add @langchain/openai @langchain/community

pnpm add @langchain/openai @langchain/community

import { OpenAIEmbeddings } from "@langchain/openai";
import { HNSWLib } from "@langchain/community/vectorstores/hnswlib";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(
  "Input: {input}\nOutput: {output}"
);

// Create a SemanticSimilarityExampleSelector that will be used to select the examples.
const exampleSelector = await SemanticSimilarityExampleSelector.fromExamples(
  [
    { input: "happy", output: "sad" },
    { input: "tall", output: "short" },
    { input: "energetic", output: "lethargic" },
    { input: "sunny", output: "gloomy" },
    { input: "windy", output: "calm" },
  ],
  new OpenAIEmbeddings(),
  HNSWLib,
  { k: 1 }
);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
  // We provide an ExampleSelector instead of examples.
  exampleSelector,
  examplePrompt,
  prefix: "Give the antonym of every input",
  suffix: "Input: {adjective}\nOutput:",
  inputVariables: ["adjective"],
});

// Input is about the weather, so should select eg. the sunny/gloomy example
console.log(await dynamicPrompt.format({ adjective: "rainy" }));
/*
  Give the antonym of every input

  Input: sunny
  Output: gloomy

  Input: rainy
  Output:
*/

// Input is a measurement, so should select the tall/short example
console.log(await dynamicPrompt.format({ adjective: "large" }));
/*
  Give the antonym of every input

  Input: tall
  Output: short

  Input: large
  Output:
*/

API 参考

OpenAIEmbeddings 来自 @langchain/openai
HNSWLib 来自 @langchain/community/vectorstores/hnswlib
PromptTemplate 来自 @langchain/core/prompts
FewShotPromptTemplate 来自 @langchain/core/prompts
SemanticSimilarityExampleSelector 来自 @langchain/core/example_selectors

默认情况下，示例对象中的每个字段都会串联在一起、嵌入并存储在向量存储中，以便以后与用户查询进行相似度搜索。

如果您只想嵌入特定键（例如，您只想搜索与用户提供的查询具有类似查询的示例），您可以在最终的options参数中传递一个inputKeys数组。

从现有向量存储中加载

您还可以通过将实例直接传递到SemanticSimilarityExampleSelector构造函数中，来使用预先初始化的向量存储，如下所示。您还可以通过addExample方法添加更多示例

// Ephemeral, in-memory vector store for demo purposes
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const embeddings = new OpenAIEmbeddings();

const memoryVectorStore = new MemoryVectorStore(embeddings);

const examples = [
  {
    query: "healthy food",
    output: `galbi`,
  },
  {
    query: "healthy food",
    output: `schnitzel`,
  },
  {
    query: "foo",
    output: `bar`,
  },
];

const exampleSelector = new SemanticSimilarityExampleSelector({
  vectorStore: memoryVectorStore,
  k: 2,
  // Only embed the "query" key of each example
  inputKeys: ["query"],
});

for (const example of examples) {
  // Format and add an example to the underlying vector store
  await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
  <user_input>
    {query}
  </user_input>
  <output>
    {output}
  </output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
  // We provide an ExampleSelector instead of examples.
  exampleSelector,
  examplePrompt,
  prefix: `Answer the user's question, using the below examples as reference:`,
  suffix: "User question: {query}",
  inputVariables: ["query"],
});

const formattedValue = await dynamicPrompt.format({
  query: "What is a healthy food?",
});
console.log(formattedValue);

/*
Answer the user's question, using the below examples as reference:

<example>
  <user_input>
    healthy
  </user_input>
  <output>
    galbi
  </output>
</example>

<example>
  <user_input>
    healthy
  </user_input>
  <output>
    schnitzel
  </output>
</example>

User question: What is a healthy food?
*/

const model = new ChatOpenAI({});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({ query: "What is a healthy food?" });
console.log(result);
/*
  AIMessage {
    content: 'A healthy food can be galbi or schnitzel.',
    additional_kwargs: { function_call: undefined }
  }
*/

API 参考

MemoryVectorStore 来自 langchain/vectorstores/memory
OpenAIEmbeddings 来自 @langchain/openai
ChatOpenAI 来自 @langchain/openai
PromptTemplate 来自 @langchain/core/prompts
FewShotPromptTemplate 来自 @langchain/core/prompts
SemanticSimilarityExampleSelector 来自 @langchain/core/example_selectors

元数据过滤

添加示例时，每个字段都可以在生成的文档中作为元数据使用。如果您希望进一步控制您的搜索空间，您可以在示例中添加额外的字段，并在初始化选择器时传递一个filter参数

// Ephemeral, in-memory vector store for demo purposes
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { Document } from "@langchain/core/documents";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const embeddings = new OpenAIEmbeddings();

const memoryVectorStore = new MemoryVectorStore(embeddings);

const examples = [
  {
    query: "healthy food",
    output: `lettuce`,
    food_type: "vegetable",
  },
  {
    query: "healthy food",
    output: `schnitzel`,
    food_type: "veal",
  },
  {
    query: "foo",
    output: `bar`,
    food_type: "baz",
  },
];

const exampleSelector = new SemanticSimilarityExampleSelector({
  vectorStore: memoryVectorStore,
  k: 2,
  // Only embed the "query" key of each example
  inputKeys: ["query"],
  // Filter type will depend on your specific vector store.
  // See the section of the docs for the specific vector store you are using.
  filter: (doc: Document) => doc.metadata.food_type === "vegetable",
});

for (const example of examples) {
  // Format and add an example to the underlying vector store
  await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
  <user_input>
    {query}
  </user_input>
  <output>
    {output}
  </output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
  // We provide an ExampleSelector instead of examples.
  exampleSelector,
  examplePrompt,
  prefix: `Answer the user's question, using the below examples as reference:`,
  suffix: "User question:\n{query}",
  inputVariables: ["query"],
});

const model = new ChatOpenAI({});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({
  query: "What is exactly one type of healthy food?",
});
console.log(result);
/*
  AIMessage {
    content: 'One type of healthy food is lettuce.',
    additional_kwargs: { function_call: undefined }
  }
*/

API 参考

MemoryVectorStore 来自 langchain/vectorstores/memory
OpenAIEmbeddings 来自 @langchain/openai
ChatOpenAI 来自 @langchain/openai
PromptTemplate 来自 @langchain/core/prompts
FewShotPromptTemplate 来自 @langchain/core/prompts
Document 来自 @langchain/core/documents
SemanticSimilarityExampleSelector 来自 @langchain/core/example_selectors

自定义向量存储检索器

您还可以传递一个向量存储检索器，而不是向量存储。这样做的一种方法是在您想要使用除相似度搜索之外的检索（例如最大边缘相关性）时很有用

/* eslint-disable @typescript-eslint/no-non-null-assertion */

// Requires a vectorstore that supports maximal marginal relevance search
import { Pinecone } from "@pinecone-database/pinecone";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { PineconeStore } from "@langchain/pinecone";
import { PromptTemplate, FewShotPromptTemplate } from "@langchain/core/prompts";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";

const pinecone = new Pinecone();

const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX!);

/**
 * Pinecone allows you to partition the records in an index into namespaces.
 * Queries and other operations are then limited to one namespace,
 * so different requests can search different subsets of your index.
 * Read more about namespaces here: https://docs.pinecone.io/guides/indexes/use-namespaces
 *
 * NOTE: If you have namespace enabled in your Pinecone index, you must provide the namespace when creating the PineconeStore.
 */
const namespace = "pinecone";

const pineconeVectorstore = await PineconeStore.fromExistingIndex(
  new OpenAIEmbeddings(),
  { pineconeIndex, namespace }
);

const pineconeMmrRetriever = pineconeVectorstore.asRetriever({
  searchType: "mmr",
  k: 2,
});

const examples = [
  {
    query: "healthy food",
    output: `lettuce`,
    food_type: "vegetable",
  },
  {
    query: "healthy food",
    output: `schnitzel`,
    food_type: "veal",
  },
  {
    query: "foo",
    output: `bar`,
    food_type: "baz",
  },
];

const exampleSelector = new SemanticSimilarityExampleSelector({
  vectorStoreRetriever: pineconeMmrRetriever,
  // Only embed the "query" key of each example
  inputKeys: ["query"],
});

for (const example of examples) {
  // Format and add an example to the underlying vector store
  await exampleSelector.addExample(example);
}

// Create a prompt template that will be used to format the examples.
const examplePrompt = PromptTemplate.fromTemplate(`<example>
  <user_input>
    {query}
  </user_input>
  <output>
    {output}
  </output>
</example>`);

// Create a FewShotPromptTemplate that will use the example selector.
const dynamicPrompt = new FewShotPromptTemplate({
  // We provide an ExampleSelector instead of examples.
  exampleSelector,
  examplePrompt,
  prefix: `Answer the user's question, using the below examples as reference:`,
  suffix: "User question:\n{query}",
  inputVariables: ["query"],
});

const model = new ChatOpenAI({});

const chain = dynamicPrompt.pipe(model);

const result = await chain.invoke({
  query: "What is exactly one type of healthy food?",
});

console.log(result);

/*
  AIMessage {
    content: 'lettuce.',
    additional_kwargs: { function_call: undefined }
  }
*/

API 参考

OpenAIEmbeddings 来自 @langchain/openai
ChatOpenAI 来自 @langchain/openai
PineconeStore 来自 @langchain/pinecone
PromptTemplate 来自 @langchain/core/prompts
FewShotPromptTemplate 来自 @langchain/core/prompts
SemanticSimilarityExampleSelector 来自 @langchain/core/example_selectors

下一步

您现在已经了解了一些关于在示例选择器中使用相似度的知识。

接下来，查看此指南，了解如何使用基于长度的示例选择器。

如何根据相似度选择示例

API 参考

从现有向量存储中加载

API 参考

元数据过滤

API 参考

自定义向量存储检索器

API 参考

下一步

此页面是否有帮助？

您也可以在 GitHub 上留下详细的反馈在 GitHub 上.

如何根据相似度选择示例

API 参考

从现有向量存储中加载​

API 参考

元数据过滤​

API 参考

自定义向量存储检索器​

API 参考

下一步​

此页面是否有帮助？

您也可以在 GitHub 上留下详细的反馈 在 GitHub 上.

从现有向量存储中加载

元数据过滤

自定义向量存储检索器

下一步

您也可以在 GitHub 上留下详细的反馈在 GitHub 上.