如何在数据库上添加语义层
您可以使用数据库查询从 Neo4j 等图数据库检索信息。一种选择是使用 LLM 生成 Cypher 语句。虽然该选项提供了出色的灵活性,但该解决方案可能很脆弱,并且无法始终如一地生成精确的 Cypher 语句。与其生成 Cypher 语句,不如将 Cypher 模板作为工具实现到语义层中,LLM 代理可以与之交互。
设置
安装依赖项
提示
- npm
- yarn
- pnpm
npm i langchain @langchain/community @langchain/openai neo4j-driver zod
yarn add langchain @langchain/community @langchain/openai neo4j-driver zod
pnpm add langchain @langchain/community @langchain/openai neo4j-driver zod
设置环境变量
我们将在本示例中使用 OpenAI
OPENAI_API_KEY=your-api-key
# Optional, use LangSmith for best-in-class observability
LANGSMITH_API_KEY=your-api-key
LANGCHAIN_TRACING_V2=true
# Reduce tracing latency if you are not in a serverless environment
# LANGCHAIN_CALLBACKS_BACKGROUND=true
接下来,我们需要定义 Neo4j 凭据。按照 这些安装步骤 设置 Neo4j 数据库。
NEO4J_URI="bolt://localhost:7687"
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="password"
以下示例将创建一个与 Neo4j 数据库的连接,并使用有关电影及其演员的示例数据填充它。
import "neo4j-driver";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";
const url = Deno.env.get("NEO4J_URI");
const username = Deno.env.get("NEO4J_USER");
const password = Deno.env.get("NEO4J_PASSWORD");
const graph = await Neo4jGraph.initialize({ url, username, password });
// Import movie information
const moviesQuery = `LOAD CSV WITH HEADERS FROM
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'
AS row
MERGE (m:Movie {id:row.movieId})
SET m.released = date(row.released),
m.title = row.title,
m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') |
MERGE (p:Person {name:trim(director)})
MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') |
MERGE (p:Person {name:trim(actor)})
MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') |
MERGE (g:Genre {name:trim(genre)})
MERGE (m)-[:IN_GENRE]->(g))`;
await graph.query(moviesQuery);
Schema refreshed successfully.
[]
使用 Cypher 模板的自定义工具
语义层包含各种工具,LLM 可以使用这些工具与知识图谱交互。它们可以有不同的复杂程度。您可以将语义层中的每个工具视为一个函数。
我们将实现的函数是检索有关电影或演员阵容的信息。
const descriptionQuery = `MATCH (m:Movie|Person)
WHERE m.title CONTAINS $candidate OR m.name CONTAINS $candidate
MATCH (m)-[r:ACTED_IN|HAS_GENRE]-(t)
WITH m, type(r) as type, collect(coalesce(t.name, t.title)) as names
WITH m, type+": "+reduce(s="", n IN names | s + n + ", ") as types
WITH m, collect(types) as contexts
WITH m, "type:" + labels(m)[0] + "\ntitle: "+ coalesce(m.title, m.name)
+ "\nyear: "+coalesce(m.released,"") +"\n" +
reduce(s="", c in contexts | s + substring(c, 0, size(c)-2) +"\n") as context
RETURN context LIMIT 1`;
const getInformation = async (entity: string) => {
try {
const data = await graph.query(descriptionQuery, { candidate: entity });
return data[0]["context"];
} catch (error) {
return "No information was found";
}
};
您可以观察到我们已经定义了用于检索信息的 Cypher 语句。因此,我们可以避免生成 Cypher 语句,并使用 LLM 代理来仅填充输入参数。为了向 LLM 代理提供有关何时使用工具及其输入参数的更多信息,我们将该函数包装为一个工具。
import { tool } from "@langchain/core/tools";
import { z } from "zod";
const informationTool = tool(
(input) => {
return getInformation(input.entity);
},
{
name: "Information",
description:
"useful for when you need to answer questions about various actors or movies",
schema: z.object({
entity: z
.string()
.describe("movie or a person mentioned in the question"),
}),
}
);
OpenAI 代理
LangChain 表达式语言使定义一个代理来通过语义层与图数据库交互变得非常方便。
import { ChatOpenAI } from "@langchain/openai";
import { AgentExecutor } from "langchain/agents";
import { formatToOpenAIFunctionMessages } from "langchain/agents/format_scratchpad";
import { OpenAIFunctionsAgentOutputParser } from "langchain/agents/openai/output_parser";
import { convertToOpenAIFunction } from "@langchain/core/utils/function_calling";
import {
ChatPromptTemplate,
MessagesPlaceholder,
} from "@langchain/core/prompts";
import { AIMessage, BaseMessage, HumanMessage } from "@langchain/core/messages";
import { RunnableSequence } from "@langchain/core/runnables";
const llm = new ChatOpenAI({ model: "gpt-3.5-turbo", temperature: 0 });
const tools = [informationTool];
const llmWithTools = llm.bind({
functions: tools.map(convertToOpenAIFunction),
});
const prompt = ChatPromptTemplate.fromMessages([
[
"system",
"You are a helpful assistant that finds information about movies and recommends them. If tools require follow up questions, make sure to ask the user for clarification. Make sure to include any available options that need to be clarified in the follow up questions Do only the things the user specifically requested.",
],
new MessagesPlaceholder("chat_history"),
["human", "{input}"],
new MessagesPlaceholder("agent_scratchpad"),
]);
const _formatChatHistory = (chatHistory) => {
const buffer: Array<BaseMessage> = [];
for (const [human, ai] of chatHistory) {
buffer.push(new HumanMessage({ content: human }));
buffer.push(new AIMessage({ content: ai }));
}
return buffer;
};
const agent = RunnableSequence.from([
{
input: (x) => x.input,
chat_history: (x) => {
if ("chat_history" in x) {
return _formatChatHistory(x.chat_history);
}
return [];
},
agent_scratchpad: (x) => {
if ("steps" in x) {
return formatToOpenAIFunctionMessages(x.steps);
}
return [];
},
},
prompt,
llmWithTools,
new OpenAIFunctionsAgentOutputParser(),
]);
const agentExecutor = new AgentExecutor({ agent, tools });
await agentExecutor.invoke({ input: "Who played in Casino?" });
{
input: "Who played in Casino?",
output: 'The movie "Casino" starred James Woods, Joe Pesci, Robert De Niro, and Sharon Stone.'
}