跳到主要内容

如何向数据库添加语义层

您可以使用数据库查询从图数据库(如 Neo4j)检索信息。一种选择是使用 LLM 生成 Cypher 语句。虽然这种选择提供了极好的灵活性,但该解决方案可能很脆弱,并且不能始终如一地生成精确的 Cypher 语句。与生成 Cypher 语句不同,我们可以在语义层中将 Cypher 模板实现为 LLM 代理可以与之交互的工具。

graph_semantic.png

危险

本指南中的代码将针对提供的数据库执行 Cypher 语句。对于生产环境,请确保数据库连接使用的凭据范围窄,仅包含必要的权限。

未能这样做可能会导致数据损坏或丢失,因为调用代码可能会尝试执行命令,如果提示得当,可能会导致数据删除、突变,或者如果数据库中存在敏感数据,则可能会读取敏感数据。

设置

安装依赖

yarn add langchain @langchain/community @langchain/openai @langchain/core neo4j-driver zod

设置环境变量

在此示例中,我们将使用 OpenAI

OPENAI_API_KEY=your-api-key

# Optional, use LangSmith for best-in-class observability
LANGSMITH_API_KEY=your-api-key
LANGSMITH_TRACING=true

# Reduce tracing latency if you are not in a serverless environment
# LANGCHAIN_CALLBACKS_BACKGROUND=true

接下来,我们需要定义 Neo4j 凭据。请按照这些安装步骤设置 Neo4j 数据库。

NEO4J_URI="bolt://127.0.0.1:7687"
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="password"

以下示例将创建与 Neo4j 数据库的连接,并使用有关电影及其演员的示例数据填充它。

import "neo4j-driver";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";

const url = process.env.NEO4J_URI;
const username = process.env.NEO4J_USER;
const password = process.env.NEO4J_PASSWORD;
const graph = await Neo4jGraph.initialize({ url, username, password });

// Import movie information
const moviesQuery = `LOAD CSV WITH HEADERS FROM
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'
AS row
MERGE (m:Movie {id:row.movieId})
SET m.released = date(row.released),
m.title = row.title,
m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') |
MERGE (p:Person {name:trim(director)})
MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') |
MERGE (p:Person {name:trim(actor)})
MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') |
MERGE (g:Genre {name:trim(genre)})
MERGE (m)-[:IN_GENRE]->(g))`;

await graph.query(moviesQuery);
Schema refreshed successfully.
[]

使用 Cypher 模板的自定义工具

语义层由暴露给 LLM 的各种工具组成,LLM 可以使用这些工具与知识图谱进行交互。它们的复杂程度各不相同。您可以将语义层中的每个工具都视为一个函数。

我们将实现的函数是检索有关电影或其演员阵容的信息。

const descriptionQuery = `MATCH (m:Movie|Person)
WHERE m.title CONTAINS $candidate OR m.name CONTAINS $candidate
MATCH (m)-[r:ACTED_IN|HAS_GENRE]-(t)
WITH m, type(r) as type, collect(coalesce(t.name, t.title)) as names
WITH m, type+": "+reduce(s="", n IN names | s + n + ", ") as types
WITH m, collect(types) as contexts
WITH m, "type:" + labels(m)[0] + "\ntitle: "+ coalesce(m.title, m.name)
+ "\nyear: "+coalesce(m.released,"") +"\n" +
reduce(s="", c in contexts | s + substring(c, 0, size(c)-2) +"\n") as context
RETURN context LIMIT 1`;

const getInformation = async (entity: string) => {
try {
const data = await graph.query(descriptionQuery, { candidate: entity });
return data[0]["context"];
} catch (error) {
return "No information was found";
}
};

您可以观察到我们已经定义了用于检索信息的 Cypher 语句。因此,我们可以避免生成 Cypher 语句,而仅使用 LLM 代理来填充输入参数。为了向 LLM 代理提供有关何时使用该工具及其输入参数的更多信息,我们将该函数包装为工具。

import { tool } from "@langchain/core/tools";
import { z } from "zod";

const informationTool = tool(
(input) => {
return getInformation(input.entity);
},
{
name: "Information",
description:
"useful for when you need to answer questions about various actors or movies",
schema: z.object({
entity: z
.string()
.describe("movie or a person mentioned in the question"),
}),
}
);

OpenAI 代理

LangChain 表达式语言使定义代理以通过语义层与图数据库交互非常方便。

import { ChatOpenAI } from "@langchain/openai";
import { AgentExecutor } from "langchain/agents";
import { formatToOpenAIFunctionMessages } from "langchain/agents/format_scratchpad";
import { OpenAIFunctionsAgentOutputParser } from "langchain/agents/openai/output_parser";
import { convertToOpenAIFunction } from "@langchain/core/utils/function_calling";
import {
ChatPromptTemplate,
MessagesPlaceholder,
} from "@langchain/core/prompts";
import { AIMessage, BaseMessage, HumanMessage } from "@langchain/core/messages";
import { RunnableSequence } from "@langchain/core/runnables";

const llm = new ChatOpenAI({ model: "gpt-3.5-turbo", temperature: 0 });
const tools = [informationTool];

const llmWithTools = llm.bind({
functions: tools.map(convertToOpenAIFunction),
});

const prompt = ChatPromptTemplate.fromMessages([
[
"system",
"You are a helpful assistant that finds information about movies and recommends them. If tools require follow up questions, make sure to ask the user for clarification. Make sure to include any available options that need to be clarified in the follow up questions Do only the things the user specifically requested.",
],
new MessagesPlaceholder("chat_history"),
["human", "{input}"],
new MessagesPlaceholder("agent_scratchpad"),
]);

const _formatChatHistory = (chatHistory) => {
const buffer: Array<BaseMessage> = [];
for (const [human, ai] of chatHistory) {
buffer.push(new HumanMessage({ content: human }));
buffer.push(new AIMessage({ content: ai }));
}
return buffer;
};

const agent = RunnableSequence.from([
{
input: (x) => x.input,
chat_history: (x) => {
if ("chat_history" in x) {
return _formatChatHistory(x.chat_history);
}
return [];
},
agent_scratchpad: (x) => {
if ("steps" in x) {
return formatToOpenAIFunctionMessages(x.steps);
}
return [];
},
},
prompt,
llmWithTools,
new OpenAIFunctionsAgentOutputParser(),
]);

const agentExecutor = new AgentExecutor({ agent, tools });
await agentExecutor.invoke({ input: "Who played in Casino?" });
{
input: "Who played in Casino?",
output: 'The movie "Casino" starred James Woods, Joe Pesci, Robert De Niro, and Sharon Stone.'
}

此页面对您有帮助吗?


您还可以留下详细的反馈 在 GitHub 上.