如何生成多个查询以检索数据
先决条件
本指南假定您熟悉以下概念
基于距离的向量数据库检索将查询嵌入 (表示) 到高维空间中,并根据“距离”查找相似的嵌入文档。但是,如果查询措辞略有改变,或者嵌入没有很好地捕捉数据的语义,检索可能会产生不同的结果。有时会进行提示工程/调整以手动解决这些问题,但这可能很繁琐。
MultiQueryRetriever
通过使用 LLM 从不同角度为给定的用户输入查询生成多个查询来自动化提示调整过程。对于每个查询,它检索一组相关的文档,并对所有查询的唯一并集进行操作,以获得一组可能相关的文档。通过从同一问题的不同角度生成多个观点,MultiQueryRetriever
可以帮助克服基于距离的检索的一些限制,并获得更丰富的结果集。
入门
提示
- npm
- yarn
- pnpm
npm i @langchain/anthropic @langchain/cohere
yarn add @langchain/anthropic @langchain/cohere
pnpm add @langchain/anthropic @langchain/cohere
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { CohereEmbeddings } from "@langchain/cohere";
import { MultiQueryRetriever } from "langchain/retrievers/multi_query";
import { ChatAnthropic } from "@langchain/anthropic";
const embeddings = new CohereEmbeddings();
const vectorstore = await MemoryVectorStore.fromTexts(
[
"Buildings are made out of brick",
"Buildings are made out of wood",
"Buildings are made out of stone",
"Cars are made out of metal",
"Cars are made out of plastic",
"mitochondria is the powerhouse of the cell",
"mitochondria is made of lipids",
],
[{ id: 1 }, { id: 2 }, { id: 3 }, { id: 4 }, { id: 5 }],
embeddings
);
const model = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
});
const retriever = MultiQueryRetriever.fromLLM({
llm: model,
retriever: vectorstore.asRetriever(),
});
const query = "What are mitochondria made of?";
const retrievedDocs = await retriever.invoke(query);
/*
Generated queries: What are the components of mitochondria?,What substances comprise the mitochondria organelle? ,What is the molecular composition of mitochondria?
*/
console.log(retrievedDocs);
[
Document {
pageContent: "mitochondria is made of lipids",
metadata: {}
},
Document {
pageContent: "mitochondria is the powerhouse of the cell",
metadata: {}
},
Document {
pageContent: "Buildings are made out of brick",
metadata: { id: 1 }
},
Document {
pageContent: "Buildings are made out of wood",
metadata: { id: 2 }
}
]
定制
您还可以提供自定义提示以调整生成哪些类型的问题。您还可以传递自定义输出解析器以解析和拆分 LLM 调用的结果,将其分成查询列表。
import { LLMChain } from "langchain/chains";
import { pull } from "langchain/hub";
import { BaseOutputParser } from "@langchain/core/output_parsers";
import { PromptTemplate } from "@langchain/core/prompts";
type LineList = {
lines: string[];
};
class LineListOutputParser extends BaseOutputParser<LineList> {
static lc_name() {
return "LineListOutputParser";
}
lc_namespace = ["langchain", "retrievers", "multiquery"];
async parse(text: string): Promise<LineList> {
const startKeyIndex = text.indexOf("<questions>");
const endKeyIndex = text.indexOf("</questions>");
const questionsStartIndex =
startKeyIndex === -1 ? 0 : startKeyIndex + "<questions>".length;
const questionsEndIndex = endKeyIndex === -1 ? text.length : endKeyIndex;
const lines = text
.slice(questionsStartIndex, questionsEndIndex)
.trim()
.split("\n")
.filter((line) => line.trim() !== "");
return { lines };
}
getFormatInstructions(): string {
throw new Error("Not implemented.");
}
}
// Default prompt is available at: https://smith.langchain.com/hub/jacob/multi-vector-retriever-german
const prompt: PromptTemplate = await pull(
"jacob/multi-vector-retriever-german"
);
const vectorstore = await MemoryVectorStore.fromTexts(
[
"Gebäude werden aus Ziegelsteinen hergestellt",
"Gebäude werden aus Holz hergestellt",
"Gebäude werden aus Stein hergestellt",
"Autos werden aus Metall hergestellt",
"Autos werden aus Kunststoff hergestellt",
"Mitochondrien sind die Energiekraftwerke der Zelle",
"Mitochondrien bestehen aus Lipiden",
],
[{ id: 1 }, { id: 2 }, { id: 3 }, { id: 4 }, { id: 5 }],
embeddings
);
const model = new ChatAnthropic({});
const llmChain = new LLMChain({
llm: model,
prompt,
outputParser: new LineListOutputParser(),
});
const retriever = new MultiQueryRetriever({
retriever: vectorstore.asRetriever(),
llmChain,
});
const query = "What are mitochondria made of?";
const retrievedDocs = await retriever.invoke(query);
/*
Generated queries: Was besteht ein Mitochondrium?,Aus welchen Komponenten setzt sich ein Mitochondrium zusammen? ,Welche Moleküle finden sich in einem Mitochondrium?
*/
console.log(retrievedDocs);
[
Document {
pageContent: "Mitochondrien bestehen aus Lipiden",
metadata: {}
},
Document {
pageContent: "Mitochondrien sind die Energiekraftwerke der Zelle",
metadata: {}
},
Document {
pageContent: "Gebäude werden aus Stein hergestellt",
metadata: { id: 3 }
},
Document {
pageContent: "Autos werden aus Metall hergestellt",
metadata: { id: 4 }
},
Document {
pageContent: "Gebäude werden aus Holz hergestellt",
metadata: { id: 2 }
},
Document {
pageContent: "Gebäude werden aus Ziegelsteinen hergestellt",
metadata: { id: 1 }
}
]
下一步
您现在已经了解了如何使用 MultiQueryRetriever
使用自动生成的查询查询向量存储。
请参阅各个部分以深入了解特定检索器、有关RAG 的更广泛教程,或这一部分以了解如何创建您自己的任何数据源上的自定义检索器。