基于 SQL 数据构建问答系统
使 LLM 系统能够查询结构化数据在性质上可能与非结构化文本数据不同。后者通常生成可以针对向量数据库搜索的文本,而结构化数据的方法通常是让 LLM 编写和执行 DSL(例如 SQL)中的查询。在本指南中,我们将介绍在数据库中的表格数据上创建问答系统的基本方法。我们将介绍使用链和代理的实现。这些系统将允许我们提出关于数据库中数据的问题,并获得自然语言答案。两者之间的主要区别在于我们的代理可以循环查询数据库多次,直到回答问题为止。
⚠️ 安全提示 ⚠️
构建 SQL 数据库的问答系统需要执行模型生成的 SQL 查询。这样做存在固有的风险。请确保您的数据库连接权限始终尽可能缩小范围,以满足您的链/代理的需求。这将减轻但不能消除构建模型驱动系统的风险。有关一般安全最佳实践的更多信息,请参阅此处。
架构
从高层次上讲,这些系统的步骤是
- 将问题转换为 SQL 查询:模型将用户输入转换为 SQL 查询。
- 执行 SQL 查询:执行查询。
- 回答问题:模型使用查询结果响应用户输入。
设置
首先,获取所需的软件包并设置环境变量: bash npm2yarn npm i langchain @langchain/community @langchain/langgraph
# Uncomment the below to use LangSmith. Not required, but recommended for debugging and observability.
# export LANGSMITH_API_KEY=<your key>
# export LANGSMITH_TRACING=true
# Reduce tracing latency if you are not in a serverless environment
# export LANGCHAIN_CALLBACKS_BACKGROUND=true
示例数据
以下示例将使用带有 Chinook 数据库的 SQLite 连接,这是一个代表数字媒体商店的示例数据库。按照这些安装步骤在与此 notebook 相同的目录中创建 Chinook.db
。您也可以通过命令行下载和构建数据库
curl -s https://raw.githubusercontent.com/lerocha/chinook-database/master/ChinookDatabase/DataSources/Chinook_Sqlite.sql | sqlite3 Chinook.db
现在,Chinook.db
在我们的目录中,我们可以使用 SqlDatabase 类与之交互
import { SqlDatabase } from "langchain/sql_db";
import { DataSource } from "typeorm";
const datasource = new DataSource({
type: "sqlite",
database: "Chinook.db",
});
const db = await SqlDatabase.fromDataSourceParams({
appDataSource: datasource,
});
await db.run("SELECT * FROM Artist LIMIT 10;");
[{"ArtistId":1,"Name":"AC/DC"},{"ArtistId":2,"Name":"Accept"},{"ArtistId":3,"Name":"Aerosmith"},{"ArtistId":4,"Name":"Alanis Morissette"},{"ArtistId":5,"Name":"Alice In Chains"},{"ArtistId":6,"Name":"Antônio Carlos Jobim"},{"ArtistId":7,"Name":"Apocalyptica"},{"ArtistId":8,"Name":"Audioslave"},{"ArtistId":9,"Name":"BackBeat"},{"ArtistId":10,"Name":"Billy Cobham"}]
太棒了!我们有了一个可以查询的 SQL 数据库。现在让我们尝试将其连接到 LLM。
链
链是可预测步骤的组合。在 LangGraph 中,我们可以通过简单的节点序列来表示链。让我们创建一个步骤序列,给定一个问题,执行以下操作: - 将问题转换为 SQL 查询; - 执行查询; - 使用结果回答原始问题。
有些场景是不受这种安排支持的。例如,此系统将为任何用户输入(甚至“你好”)执行 SQL 查询。重要的是,正如我们将在下面看到的,有些问题需要多次查询才能回答。我们将在代理部分中解决这些场景。
应用程序状态
LangGraph 状态控制应用程序的输入数据、步骤之间传输的数据以及应用程序输出的数据。
对于此应用程序,我们可以只跟踪输入问题、生成的查询、查询结果和生成的答案
import { Annotation } from "@langchain/langgraph";
const InputStateAnnotation = Annotation.Root({
question: Annotation<string>,
});
const StateAnnotation = Annotation.Root({
question: Annotation<string>,
query: Annotation<string>,
result: Annotation<string>,
answer: Annotation<string>,
});
现在我们只需要在状态下运行并填充其内容的功能。
将问题转换为 SQL 查询
第一步是将用户输入转换为 SQL 查询。为了可靠地获取 SQL 查询(不包含 markdown 格式和解释或说明),我们将使用 LangChain 的结构化输出抽象。
让我们为我们的应用程序选择一个聊天模型
选择你的聊天模型
- Groq
- OpenAI
- Anthropic
- FireworksAI
- MistralAI
- VertexAI
安装依赖项
请参阅 本节,了解有关安装集成包的通用说明.
- npm
- yarn
- pnpm
npm i @langchain/groq
yarn add @langchain/groq
pnpm add @langchain/groq
添加环境变量
GROQ_API_KEY=your-api-key
实例化模型
import { ChatGroq } from "@langchain/groq";
const llm = new ChatGroq({
model: "llama-3.3-70b-versatile",
temperature: 0
});
安装依赖项
请参阅 本节,了解有关安装集成包的通用说明.
- npm
- yarn
- pnpm
npm i @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
添加环境变量
OPENAI_API_KEY=your-api-key
实例化模型
import { ChatOpenAI } from "@langchain/openai";
const llm = new ChatOpenAI({
model: "gpt-4o-mini",
temperature: 0
});
安装依赖项
请参阅 本节,了解有关安装集成包的通用说明.
- npm
- yarn
- pnpm
npm i @langchain/anthropic
yarn add @langchain/anthropic
pnpm add @langchain/anthropic
添加环境变量
ANTHROPIC_API_KEY=your-api-key
实例化模型
import { ChatAnthropic } from "@langchain/anthropic";
const llm = new ChatAnthropic({
model: "claude-3-5-sonnet-20240620",
temperature: 0
});
安装依赖项
请参阅 本节,了解有关安装集成包的通用说明.
- npm
- yarn
- pnpm
npm i @langchain/community
yarn add @langchain/community
pnpm add @langchain/community
添加环境变量
FIREWORKS_API_KEY=your-api-key
实例化模型
import { ChatFireworks } from "@langchain/community/chat_models/fireworks";
const llm = new ChatFireworks({
model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
temperature: 0
});
安装依赖项
请参阅 本节,了解有关安装集成包的通用说明.
- npm
- yarn
- pnpm
npm i @langchain/mistralai
yarn add @langchain/mistralai
pnpm add @langchain/mistralai
添加环境变量
MISTRAL_API_KEY=your-api-key
实例化模型
import { ChatMistralAI } from "@langchain/mistralai";
const llm = new ChatMistralAI({
model: "mistral-large-latest",
temperature: 0
});
安装依赖项
请参阅 本节,了解有关安装集成包的通用说明.
- npm
- yarn
- pnpm
npm i @langchain/google-vertexai
yarn add @langchain/google-vertexai
pnpm add @langchain/google-vertexai
添加环境变量
GOOGLE_APPLICATION_CREDENTIALS=credentials.json
实例化模型
import { ChatVertexAI } from "@langchain/google-vertexai";
const llm = new ChatVertexAI({
model: "gemini-1.5-flash",
temperature: 0
});
我们将从 Prompt Hub 中提取提示来指导模型。
import { pull } from "langchain/hub";
import { ChatPromptTemplate } from "@langchain/core/prompts";
const queryPromptTemplate = await pull<ChatPromptTemplate>(
"langchain-ai/sql-query-system-prompt"
);
console.log(queryPromptTemplate.promptMessages[0].lc_kwargs.prompt.template);
Given an input question, create a syntactically correct {dialect} query to run to help find the answer. Unless the user specifies in his question a specific number of examples they wish to obtain, always limit your query to at most {top_k} results. You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for a the few relevant columns given the question.
Pay attention to use only the column names that you can see in the schema description. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Only use the following tables:
{table_info}
Question: {input}
提示包括几个我们需要填充的参数,例如 SQL 方言和表模式。LangChain 的 SqlDatabase 对象包含有助于实现此目的的方法。我们的 writeQuery
步骤将只填充这些参数并提示模型生成 SQL 查询
import { z } from "zod";
const queryOutput = z.object({
query: z.string().describe("Syntactically valid SQL query."),
});
const structuredLlm = llm.withStructuredOutput(queryOutput);
const writeQuery = async (state: typeof InputStateAnnotation.State) => {
const promptValue = await queryPromptTemplate.invoke({
dialect: db.appDataSourceOptions.type,
top_k: 10,
table_info: await db.getTableInfo(),
input: state.question,
});
const result = await structuredLlm.invoke(promptValue);
return { query: result.query };
};
让我们测试一下
await writeQuery({ question: "How many Employees are there?" });
{ query: 'SELECT COUNT(*) AS EmployeeCount FROM Employee;' }
执行查询
这是创建 SQL 链最危险的部分。 请仔细考虑是否可以对您的数据运行自动化查询。尽可能减少数据库连接权限。考虑在查询执行之前向您的链添加人工审批步骤(见下文)。
要执行查询,我们将从 langchain-community 加载一个工具。我们的 executeQuery
节点将只包装此工具
import { QuerySqlTool } from "langchain/tools/sql";
const executeQuery = async (state: typeof StateAnnotation.State) => {
const executeQueryTool = new QuerySqlTool(db);
return { result: await executeQueryTool.invoke(state.query) };
};
测试此步骤
await executeQuery({
question: "",
query: "SELECT COUNT(*) AS EmployeeCount FROM Employee;",
result: "",
answer: "",
});
{ result: '[{"EmployeeCount":8}]' }
生成答案
最后,我们的最后一步是根据从数据库中提取的信息生成问题的答案
const generateAnswer = async (state: typeof StateAnnotation.State) => {
const promptValue =
"Given the following user question, corresponding SQL query, " +
"and SQL result, answer the user question.\n\n" +
`Question: ${state.question}\n` +
`SQL Query: ${state.query}\n` +
`SQL Result: ${state.result}\n`;
const response = await llm.invoke(promptValue);
return { answer: response.content };
};
使用 LangGraph 编排
最后,我们将我们的应用程序编译成一个单独的 graph
对象。在本例中,我们只是将这三个步骤连接成一个序列。
import { StateGraph } from "@langchain/langgraph";
const graphBuilder = new StateGraph({
stateSchema: StateAnnotation,
})
.addNode("writeQuery", writeQuery)
.addNode("executeQuery", executeQuery)
.addNode("generateAnswer", generateAnswer)
.addEdge("__start__", "writeQuery")
.addEdge("writeQuery", "executeQuery")
.addEdge("executeQuery", "generateAnswer")
.addEdge("generateAnswer", "__end__");
const graph = graphBuilder.compile();
LangGraph 还附带内置实用程序,用于可视化应用程序的控制流
// Note: tslab only works inside a jupyter notebook. Don't worry about running this code yourself!
import * as tslab from "tslab";
const image = await graph.getGraph().drawMermaidPng();
const arrayBuffer = await image.arrayBuffer();
await tslab.display.png(new Uint8Array(arrayBuffer));
让我们测试一下我们的应用程序!请注意,我们可以流式传输各个步骤的结果
let inputs = { question: "How many employees are there?" };
console.log(inputs);
console.log("\n====\n");
for await (const step of await graph.stream(inputs, {
streamMode: "updates",
})) {
console.log(step);
console.log("\n====\n");
}
{ question: 'How many employees are there?' }
====
{
writeQuery: { query: 'SELECT COUNT(*) AS EmployeeCount FROM Employee;' }
}
====
{ executeQuery: { result: '[{"EmployeeCount":8}]' } }
====
{ generateAnswer: { answer: 'There are 8 employees.' } }
====
查看 LangSmith 追踪。
人工参与循环
LangGraph 支持许多功能,这些功能可能对此工作流程很有用。其中之一是人工参与循环:我们可以在敏感步骤(例如执行 SQL 查询)之前中断我们的应用程序以进行人工审查。这由 LangGraph 的持久性层启用,该层将运行进度保存到您选择的存储中。下面,我们指定内存中的存储
import { MemorySaver } from "@langchain/langgraph";
const checkpointer = new MemorySaver();
const graphWithInterrupt = graphBuilder.compile({
checkpointer: checkpointer,
interruptBefore: ["executeQuery"],
});
// Now that we're using persistence, we need to specify a thread ID
// so that we can continue the run after review.
const threadConfig = {
configurable: { thread_id: "1" },
streamMode: "updates" as const,
};
const image = await graphWithInterrupt.getGraph().drawMermaidPng();
const arrayBuffer = await image.arrayBuffer();
await tslab.display.png(new Uint8Array(arrayBuffer));
让我们重复相同的运行,添加一个简单的“是/否”审批步骤
console.log(inputs);
console.log("\n====\n");
for await (const step of await graphWithInterrupt.stream(
inputs,
threadConfig
)) {
console.log(step);
console.log("\n====\n");
}
// Will log when the graph is interrupted, after `executeQuery`.
console.log("---GRAPH INTERRUPTED---");
{ question: 'How many employees are there?' }
====
{
writeQuery: { query: 'SELECT COUNT(*) AS EmployeeCount FROM Employee;' }
}
====
---GRAPH INTERRUPTED---
运行在查询执行之前中断。此时,我们的应用程序可以处理与用户的交互,用户可以查看查询。
如果获得批准,使用 null
输入运行相同的线程将从我们离开的地方继续。这由 LangGraph 的持久性层启用。
for await (const step of await graphWithInterrupt.stream(null, threadConfig)) {
console.log(step);
console.log("\n====\n");
}
{ executeQuery: { result: '[{"EmployeeCount":8}]' } }
====
{ generateAnswer: { answer: 'There are 8 employees.' } }
====
有关更多详细信息和示例,请参阅 此 LangGraph 指南。
后续步骤
对于更复杂的查询生成,我们可能想要创建少量样本提示或添加查询检查步骤。有关此类高级技术和更多信息,请查看
代理
代理利用 LLM 的推理能力在执行期间做出决策。使用代理允许您卸载查询生成和执行过程中的额外自由裁量权。虽然它们的行为不如上面的“链”那样可预测,但它们具有一些优势
- 他们可以根据需要多次查询数据库以回答用户问题。
- 他们可以通过运行生成的查询、捕获回溯并正确地重新生成查询来从错误中恢复。
- 他们可以根据数据库的模式以及数据库的内容(例如描述特定表)来回答问题。
下面我们组装一个最小的 SQL 代理。我们将使用 LangChain 的 SqlToolkit 为其配备一组工具。使用 LangGraph 的 预构建的 ReAct 代理构造函数,我们可以一行完成此操作。
SqlToolkit
包括以下工具
- 创建和执行查询
- 检查查询语法
- 检索表描述
- ... 以及更多
import { SqlToolkit } from "langchain/agents/toolkits/sql";
const toolkit = new SqlToolkit(db, llm);
const tools = toolkit.getTools();
console.log(
tools.map((tool) => ({
name: tool.name,
description: tool.description,
}))
);
[
{
name: 'query-sql',
description: 'Input to this tool is a detailed and correct SQL query, output is a result from the database.\n' +
' If the query is not correct, an error message will be returned.\n' +
' If an error is returned, rewrite the query, check the query, and try again.'
},
{
name: 'info-sql',
description: 'Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables.\n' +
' Be sure that the tables actually exist by calling list-tables-sql first!\n' +
'\n' +
' Example Input: "table1, table2, table3.'
},
{
name: 'list-tables-sql',
description: 'Input is an empty string, output is a comma-separated list of tables in the database.'
},
{
name: 'query-checker',
description: 'Use this tool to double check if your query is correct before executing it.\n' +
' Always use this tool before executing a query with query-sql!'
}
]
系统提示
我们还需要为我们的代理加载系统提示。这将包括有关如何行为的说明。
import { pull } from "langchain/hub";
import { ChatPromptTemplate } from "@langchain/core/prompts";
const systemPromptTemplate = await pull<ChatPromptTemplate>(
"langchain-ai/sql-agent-system-prompt"
);
console.log(systemPromptTemplate.promptMessages[0].lc_kwargs.prompt.template);
You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.
You can order the results by a relevant column to return the most interesting examples in the database.
Never query for all the columns from a specific table, only ask for the relevant columns given the question.
You have access to tools for interacting with the database.
Only use the below tools. Only use the information returned by the below tools to construct your final answer.
You MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.
DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
To start you should ALWAYS look at the tables in the database to see what you can query.
Do NOT skip this step.
Then you should query the schema of the most relevant tables.
让我们填充提示中突出显示的参数
const systemMessage = await systemPromptTemplate.format({
dialect: "SQLite",
top_k: 5,
});
初始化代理
我们将使用预构建的 LangGraph 代理来构建我们的代理
import { createReactAgent } from "@langchain/langgraph/prebuilt";
const agent = createReactAgent({
llm: llm,
tools: tools,
stateModifier: systemMessage,
});
考虑代理如何响应以下问题
展开以查看 `prettyPrint` 代码。
import { AIMessage, BaseMessage, isAIMessage } from "@langchain/core/messages";
const prettyPrint = (message: BaseMessage) => {
let txt = `[${message._getType()}]: ${message.content}`;
if ((isAIMessage(message) && message.tool_calls?.length) || 0 > 0) {
const tool_calls = (message as AIMessage)?.tool_calls
?.map((tc) => `- ${tc.name}(${JSON.stringify(tc.args)})`)
.join("\n");
txt += ` \nTools: \n${tool_calls}`;
}
console.log(txt);
};
let inputs2 = {
messages: [
{ role: "user", content: "Which country's customers spent the most?" },
],
};
for await (const step of await agent.stream(inputs2, {
streamMode: "values",
})) {
const lastMessage = step.messages[step.messages.length - 1];
prettyPrint(lastMessage);
console.log("-----\n");
}
[human]: Which country's customers spent the most?
-----
[ai]:
Tools:
- list-tables-sql({"input":""})
-----
[tool]: Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
-----
[ai]:
Tools:
- info-sql({"input":"Customer, Invoice, InvoiceLine"})
- info-sql({"input":"Invoice"})
-----
[tool]: CREATE TABLE Invoice (
InvoiceId INTEGER NOT NULL, CustomerId INTEGER NOT NULL, InvoiceDate DATETIME NOT NULL, BillingAddress NVARCHAR(70) , BillingCity NVARCHAR(40) , BillingState NVARCHAR(40) , BillingCountry NVARCHAR(40) , BillingPostalCode NVARCHAR(10) , Total NUMERIC(10,2) NOT NULL)
SELECT * FROM "Invoice" LIMIT 3;
InvoiceId CustomerId InvoiceDate BillingAddress BillingCity BillingState BillingCountry BillingPostalCode Total
1 2 2021-01-01 00:00:00 Theodor-Heuss-Straße 34 Stuttgart null Germany 70174 1.98
2 4 2021-01-02 00:00:00 Ullevålsveien 14 Oslo null Norway 0171 3.96
3 8 2021-01-03 00:00:00 Grétrystraat 63 Brussels null Belgium 1000 5.94
-----
[ai]:
Tools:
- query-checker({"input":"SELECT c.Country, SUM(i.Total) AS TotalSpent \nFROM Customer c \nJOIN Invoice i ON c.CustomerId = i.CustomerId \nGROUP BY c.Country \nORDER BY TotalSpent DESC \nLIMIT 5;"})
-----
[tool]: The SQL query you provided appears to be correct and does not contain any of the common mistakes listed. It properly joins the `Customer` and `Invoice` tables, groups the results by country, and orders the total spending in descending order while limiting the results to the top 5 countries.
Here is the original query reproduced:
```sql
SELECT c.Country, SUM(i.Total) AS TotalSpent
FROM Customer c
JOIN Invoice i ON c.CustomerId = i.CustomerId
GROUP BY c.Country
ORDER BY TotalSpent DESC
LIMIT 5;
```
No changes are necessary.
-----
[ai]:
Tools:
- query-sql({"input":"SELECT c.Country, SUM(i.Total) AS TotalSpent \nFROM Customer c \nJOIN Invoice i ON c.CustomerId = i.CustomerId \nGROUP BY c.Country \nORDER BY TotalSpent DESC \nLIMIT 5;"})
-----
[tool]: [{"Country":"USA","TotalSpent":523.0600000000003},{"Country":"Canada","TotalSpent":303.9599999999999},{"Country":"France","TotalSpent":195.09999999999994},{"Country":"Brazil","TotalSpent":190.09999999999997},{"Country":"Germany","TotalSpent":156.48}]
-----
[ai]: The countries whose customers spent the most are:
1. **USA** - $523.06
2. **Canada** - $303.96
3. **France** - $195.10
4. **Brazil** - $190.10
5. **Germany** - $156.48
-----
您还可以使用 LangSmith 追踪来可视化这些步骤和相关的元数据。
请注意,代理执行多个查询,直到它拥有所需的信息:1. 列出可用表;2. 检索三个表的模式;3. 通过连接操作查询多个表。
然后,代理能够使用最终查询的结果来生成原始问题的答案。
代理可以类似地处理定性问题
let inputs3 = {
messages: [{ role: "user", content: "Describe the playlisttrack table" }],
};
for await (const step of await agent.stream(inputs3, {
streamMode: "values",
})) {
const lastMessage = step.messages[step.messages.length - 1];
prettyPrint(lastMessage);
console.log("-----\n");
}
[human]: Describe the playlisttrack table
-----
[ai]:
Tools:
- list-tables-sql({"input":""})
-----
[tool]: Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
-----
[ai]:
Tools:
- info-sql({"input":"PlaylistTrack"})
-----
[tool]: CREATE TABLE PlaylistTrack (
PlaylistId INTEGER NOT NULL, TrackId INTEGER NOT NULL)
SELECT * FROM "PlaylistTrack" LIMIT 3;
PlaylistId TrackId
1 3402
1 3389
1 3390
-----
[ai]: The `PlaylistTrack` table has the following schema:
- **PlaylistId**: INTEGER (NOT NULL)
- **TrackId**: INTEGER (NOT NULL)
This table is used to associate tracks with playlists. Here are some sample rows from the table:
| PlaylistId | TrackId |
|------------|---------|
| 1 | 3402 |
| 1 | 3389 |
| 1 | 3390 |
-----
处理高基数列
为了过滤包含专有名词(如地址、歌曲名称或艺术家)的列,我们首先需要仔细检查拼写,以便正确过滤数据。
我们可以通过创建一个向量存储来实现这一点,其中包含数据库中存在的所有不同的专有名词。然后,我们可以让代理在用户的问题中包含专有名词时查询该向量存储,以找到该词的正确拼写。这样,代理可以确保在构建目标查询之前了解用户指的是哪个实体。
首先,我们需要我们想要的每个实体的唯一值,为此我们定义一个函数,该函数将结果解析为元素列表
async function queryAsList(database: any, query: string): Promise<string[]> {
const res: Array<{ [key: string]: string }> = JSON.parse(
await database.run(query)
)
.flat()
.filter((el: any) => el != null);
const justValues: Array<string> = res.map((item) =>
Object.values(item)[0]
.replace(/\b\d+\b/g, "")
.trim()
);
return justValues;
}
// Gather entities into a list
let artists: string[] = await queryAsList(db, "SELECT Name FROM Artist");
let albums: string[] = await queryAsList(db, "SELECT Title FROM Album");
let properNouns = artists.concat(albums);
console.log(`Total: ${properNouns.length}\n`);
console.log(`Sample: ${properNouns.slice(0, 5)}...`);
Total: 622
Sample: AC/DC,Accept,Aerosmith,Alanis Morissette,Alice In Chains...
使用此函数,我们可以创建一个检索器工具,代理可以自行决定执行该工具。
选择你的嵌入模型
- OpenAI
- Azure
- AWS
- VertexAI
- MistralAI
- Cohere
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
OPENAI_API_KEY=your-api-key
import { OpenAIEmbeddings } from "@langchain/openai";
const embeddings = new OpenAIEmbeddings({
model: "text-embedding-3-large"
});
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
AZURE_OPENAI_API_INSTANCE_NAME=<YOUR_INSTANCE_NAME>
AZURE_OPENAI_API_KEY=<YOUR_KEY>
AZURE_OPENAI_API_VERSION="2024-02-01"
import { AzureOpenAIEmbeddings } from "@langchain/openai";
const embeddings = new AzureOpenAIEmbeddings({
azureOpenAIApiEmbeddingsDeploymentName: "text-embedding-ada-002"
});
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/aws
yarn add @langchain/aws
pnpm add @langchain/aws
BEDROCK_AWS_REGION=your-region
import { BedrockEmbeddings } from "@langchain/aws";
const embeddings = new BedrockEmbeddings({
model: "amazon.titan-embed-text-v1"
});
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/google-vertexai
yarn add @langchain/google-vertexai
pnpm add @langchain/google-vertexai
GOOGLE_APPLICATION_CREDENTIALS=credentials.json
import { VertexAIEmbeddings } from "@langchain/google-vertexai";
const embeddings = new VertexAIEmbeddings({
model: "text-embedding-004"
});
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/mistralai
yarn add @langchain/mistralai
pnpm add @langchain/mistralai
MISTRAL_API_KEY=your-api-key
import { MistralAIEmbeddings } from "@langchain/mistralai";
const embeddings = new MistralAIEmbeddings({
model: "mistral-embed"
});
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/cohere
yarn add @langchain/cohere
pnpm add @langchain/cohere
COHERE_API_KEY=your-api-key
import { CohereEmbeddings } from "@langchain/cohere";
const embeddings = new CohereEmbeddings({
model: "embed-english-v3.0"
});
选择你的向量存储
- 内存
- Chroma
- FAISS
- MongoDB
- PGVector
- Pinecone
- Qdrant
安装依赖项
- npm
- yarn
- pnpm
npm i langchain
yarn add langchain
pnpm add langchain
import { MemoryVectorStore } from "langchain/vectorstores/memory";
const vectorStore = new MemoryVectorStore(embeddings);
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/community
yarn add @langchain/community
pnpm add @langchain/community
import { Chroma } from "@langchain/community/vectorstores/chroma";
const vectorStore = new Chroma(embeddings, {
collectionName: "a-test-collection",
});
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/community
yarn add @langchain/community
pnpm add @langchain/community
import { FaissStore } from "@langchain/community/vectorstores/faiss";
const vectorStore = new FaissStore(embeddings, {});
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/mongodb
yarn add @langchain/mongodb
pnpm add @langchain/mongodb
import { MongoDBAtlasVectorSearch } from "@langchain/mongodb"
import { MongoClient } from "mongodb";
const client = new MongoClient(process.env.MONGODB_ATLAS_URI || "");
const collection = client
.db(process.env.MONGODB_ATLAS_DB_NAME)
.collection(process.env.MONGODB_ATLAS_COLLECTION_NAME);
const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {
collection: collection,
indexName: "vector_index",
textKey: "text",
embeddingKey: "embedding",
});
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/community
yarn add @langchain/community
pnpm add @langchain/community
import { PGVectorStore } from "@langchain/community/vectorstores/pgvector";
const vectorStore = await PGVectorStore.initialize(embeddings, {})
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/pinecone
yarn add @langchain/pinecone
pnpm add @langchain/pinecone
import { PineconeStore } from "@langchain/pinecone";
import { Pinecone as PineconeClient } from "@pinecone-database/pinecone";
const pinecone = new PineconeClient();
const vectorStore = new PineconeStore(embeddings, {
pineconeIndex,
maxConcurrency: 5,
});
安装依赖项
- npm
- yarn
- pnpm
npm i @langchain/qdrant
yarn add @langchain/qdrant
pnpm add @langchain/qdrant
import { QdrantVectorStore } from "@langchain/qdrant";
const vectorStore = await QdrantVectorStore.fromExistingCollection(embeddings, {
url: process.env.QDRANT_URL,
collectionName: "langchainjs-testing",
});
我们现在可以构建一个检索工具,该工具可以搜索数据库中相关的专有名词
import { createRetrieverTool } from "langchain/tools/retriever";
import { Document } from "@langchain/core/documents";
const documents = properNouns.map(
(text) => new Document({ pageContent: text })
);
await vectorStore.addDocuments(documents);
const retriever = vectorStore.asRetriever(5);
const retrieverTool = createRetrieverTool(retriever, {
name: "searchProperNouns",
description:
"Use to look up values to filter on. Input is an approximate spelling " +
"of the proper noun, output is valid proper nouns. Use the noun most " +
"similar to the search.",
});
让我们尝试一下
console.log(await retrieverTool.invoke({ query: "Alice Chains" }));
Alice In Chains
Alanis Morissette
Jagged Little Pill
Angel Dust
Amy Winehouse
这样,如果代理确定需要根据“Alice Chains”之类的艺术家编写过滤器,它可以首先使用检索器工具来观察列的相关值。
整合在一起
// Add to system message
let suffix =
"If you need to filter on a proper noun like a Name, you must ALWAYS first look up " +
"the filter value using the 'search_proper_nouns' tool! Do not try to " +
"guess at the proper name - use this function to find similar ones.";
const system = systemMessage + suffix;
const updatedTools = tools.concat(retrieverTool);
const agent2 = createReactAgent({
llm: llm,
tools: updatedTools,
stateModifier: system,
});
let inputs4 = {
messages: [
{ role: "user", content: "How many albums does alis in chain have?" },
],
};
for await (const step of await agent2.stream(inputs4, {
streamMode: "values",
})) {
const lastMessage = step.messages[step.messages.length - 1];
prettyPrint(lastMessage);
console.log("-----\n");
}
[human]: How many albums does alis in chain have?
-----
[ai]:
Tools:
- searchProperNouns({"query":"alis in chain"})
-----
[tool]: Alice In Chains
Alanis Morissette
Up An' Atom
A-Sides
Jagged Little Pill
-----
[ai]:
Tools:
- query-sql({"input":"SELECT COUNT(*) FROM albums WHERE artist_name = 'Alice In Chains'"})
-----
[tool]: QueryFailedError: SQLITE_ERROR: no such table: albums
-----
[ai]:
Tools:
- list-tables-sql({"input":""})
-----
[tool]: Album, Artist, Customer, Employee, Genre, Invoice, InvoiceLine, MediaType, Playlist, PlaylistTrack, Track
-----
[ai]:
Tools:
- info-sql({"input":"Album"})
- info-sql({"input":"Artist"})
-----
[tool]: CREATE TABLE Artist (
ArtistId INTEGER NOT NULL, Name NVARCHAR(120) )
SELECT * FROM "Artist" LIMIT 3;
ArtistId Name
1 AC/DC
2 Accept
3 Aerosmith
-----
[ai]:
Tools:
- query-sql({"input":"SELECT COUNT(*) FROM Album WHERE ArtistId = (SELECT ArtistId FROM Artist WHERE Name = 'Alice In Chains')"})
-----
[tool]: [{"COUNT(*)":1}]
-----
[ai]: Alice In Chains has released 1 album.
-----
正如我们在流式传输的步骤和 LangSmith 追踪中看到的那样,代理使用了 searchProperNouns
工具,以便检查如何正确查询数据库以查找这位特定的艺术家。