Azure AI 搜索
Azure AI 搜索(以前称为 Azure Search 和 Azure Cognitive Search)是一个分布式 RESTful 搜索引擎,针对 Azure 上生产规模工作负载的快速性和相关性进行了优化。它还支持使用k 近邻(kNN)算法的向量搜索,以及语义搜索。
此向量存储集成支持全文搜索、向量搜索和混合搜索以获得最佳排名性能。
了解如何从此页面中利用 Azure AI 搜索的向量搜索功能。如果您没有 Azure 帐户,可以创建一个免费帐户开始使用。
设置
您首先需要安装 @azure/search-documents
SDK 和@langchain/community
包
- npm
- Yarn
- pnpm
npm install -S @langchain/community @azure/search-documents
yarn add @langchain/community @azure/search-documents
pnpm add @langchain/community @azure/search-documents
您还需要运行 Azure AI 搜索实例。您可以按照本指南在 Azure 门户上部署免费版本,无需任何费用。
运行实例后,请确保您拥有端点和管理员密钥(查询密钥只能用于搜索文档,不能用于索引、更新或删除)。端点是实例的 URL,您可以在 Azure 门户中找到,位于实例的“概述”部分。管理员密钥可以在实例的“密钥”部分找到。然后,您需要设置以下环境变量
# Azure AI Search connection settings
AZURE_AISEARCH_ENDPOINT=
AZURE_AISEARCH_KEY=
# If you're using Azure OpenAI API, you'll need to set these variables
AZURE_OPENAI_API_KEY=
AZURE_OPENAI_API_INSTANCE_NAME=
AZURE_OPENAI_API_DEPLOYMENT_NAME=
AZURE_OPENAI_API_EMBEDDINGS_DEPLOYMENT_NAME=
AZURE_OPENAI_API_VERSION=
# Or you can use the OpenAI API directly
OPENAI_API_KEY=
API 参考
关于混合搜索
混合搜索是一种结合了全文搜索和向量搜索优势以提供最佳排名性能的功能。它在 Azure AI 搜索向量存储中默认启用,但您可以在创建向量存储时通过设置 search.type
属性来选择不同的搜索查询类型。
您可以在官方文档中详细了解混合搜索以及它如何提高搜索结果的相关性。
在某些情况下,例如检索增强型生成(RAG),您可能希望除了混合搜索之外还启用语义排名,以提高搜索结果的相关性。您可以在创建向量存储时通过将 search.type
属性设置为 AzureAISearchQueryType.SemanticHybrid
来启用语义排名。请注意,语义排名功能仅在基本版和更高版本定价层中可用,并且受区域可用性限制。
您可以在这篇博文中详细了解使用语义排名与混合搜索的性能。
示例:索引文档、向量搜索和 LLM 集成
下面是一个示例,它从 Azure AI 搜索中的文件中索引文档,运行混合搜索查询,最后使用链来回答基于检索到的文档的自然语言问题。
import {
AzureAISearchVectorStore,
AzureAISearchQueryType,
} from "@langchain/community/vectorstores/azure_aisearch";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { createRetrievalChain } from "langchain/chains/retrieval";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
// Load documents from file
const loader = new TextLoader("./state_of_the_union.txt");
const rawDocuments = await loader.load();
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 0,
});
const documents = await splitter.splitDocuments(rawDocuments);
// Create Azure AI Search vector store
const store = await AzureAISearchVectorStore.fromDocuments(
documents,
new OpenAIEmbeddings(),
{
search: {
type: AzureAISearchQueryType.SimilarityHybrid,
},
}
);
// The first time you run this, the index will be created.
// You may need to wait a bit for the index to be created before you can perform
// a search, or you can create the index manually beforehand.
// Performs a similarity search
const resultDocuments = await store.similaritySearch(
"What did the president say about Ketanji Brown Jackson?"
);
console.log("Similarity search results:");
console.log(resultDocuments[0].pageContent);
/*
Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections.
Tonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service.
One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.
And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.
*/
// Use the store as part of a chain
const model = new ChatOpenAI({ model: "gpt-3.5-turbo-1106" });
const questionAnsweringPrompt = ChatPromptTemplate.fromMessages([
[
"system",
"Answer the user's questions based on the below context:\n\n{context}",
],
["human", "{input}"],
]);
const combineDocsChain = await createStuffDocumentsChain({
llm: model,
prompt: questionAnsweringPrompt,
});
const chain = await createRetrievalChain({
retriever: store.asRetriever(),
combineDocsChain,
});
const response = await chain.invoke({
input: "What is the president's top priority regarding prices?",
});
console.log("Chain response:");
console.log(response.answer);
/*
The president's top priority is getting prices under control.
*/
API 参考
- AzureAISearchVectorStore 来自
@langchain/community/vectorstores/azure_aisearch
- AzureAISearchQueryType 来自
@langchain/community/vectorstores/azure_aisearch
- ChatPromptTemplate 来自
@langchain/core/prompts
- ChatOpenAI 来自
@langchain/openai
- OpenAIEmbeddings 来自
@langchain/openai
- createStuffDocumentsChain 来自
langchain/chains/combine_documents
- createRetrievalChain 来自
langchain/chains/retrieval
- TextLoader 来自
langchain/document_loaders/fs/text
- RecursiveCharacterTextSplitter 来自
@langchain/textsplitters