如何通过提示改进结果
在本指南中,我们将介绍用于改进图数据库查询生成的提示策略。我们将主要关注在您的提示中获取相关数据库特定信息的方法。
设置
安装依赖项
提示
- npm
- yarn
- pnpm
npm i langchain @langchain/community @langchain/openai neo4j-driver
yarn add langchain @langchain/community @langchain/openai neo4j-driver
pnpm add langchain @langchain/community @langchain/openai neo4j-driver
设置环境变量
我们将在此示例中使用 OpenAI
OPENAI_API_KEY=your-api-key
# Optional, use LangSmith for best-in-class observability
LANGSMITH_API_KEY=your-api-key
LANGCHAIN_TRACING_V2=true
# Reduce tracing latency if you are not in a serverless environment
# LANGCHAIN_CALLBACKS_BACKGROUND=true
接下来,我们需要定义 Neo4j 凭据。按照这些安装步骤设置 Neo4j 数据库。
NEO4J_URI="bolt://127.0.0.1:7687"
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="password"
以下示例将创建与 Neo4j 数据库的连接,并将用有关电影及其演员的示例数据填充它。
const url = Deno.env.get("NEO4J_URI");
const username = Deno.env.get("NEO4J_USER");
const password = Deno.env.get("NEO4J_PASSWORD");
import "neo4j-driver";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";
const graph = await Neo4jGraph.initialize({ url, username, password });
// Import movie information
const moviesQuery = `LOAD CSV WITH HEADERS FROM
'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv'
AS row
MERGE (m:Movie {id:row.movieId})
SET m.released = date(row.released),
m.title = row.title,
m.imdbRating = toFloat(row.imdbRating)
FOREACH (director in split(row.director, '|') |
MERGE (p:Person {name:trim(director)})
MERGE (p)-[:DIRECTED]->(m))
FOREACH (actor in split(row.actors, '|') |
MERGE (p:Person {name:trim(actor)})
MERGE (p)-[:ACTED_IN]->(m))
FOREACH (genre in split(row.genres, '|') |
MERGE (g:Genre {name:trim(genre)})
MERGE (m)-[:IN_GENRE]->(g))`;
await graph.query(moviesQuery);
Schema refreshed successfully.
[]
过滤图模式
有时,您可能需要在生成 Cypher 语句时关注图模式的特定子集。假设我们正在处理以下图模式
await graph.refreshSchema();
console.log(graph.getSchema());
Node properties are the following:
Movie {imdbRating: FLOAT, id: STRING, released: DATE, title: STRING}, Person {name: STRING}, Genre {name: STRING}, Chunk {embedding: LIST, id: STRING, text: STRING}
Relationship properties are the following:
The relationships are the following:
(:Movie)-[:IN_GENRE]->(:Genre), (:Person)-[:DIRECTED]->(:Movie), (:Person)-[:ACTED_IN]->(:Movie)
少样本示例
在提示中包含将自然语言问题转换为针对我们数据库的有效 Cypher 查询的示例通常会提高模型性能,尤其是对于复杂查询。
假设我们有以下示例
const examples = [
{
question: "How many artists are there?",
query: "MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)",
},
{
question: "Which actors played in the movie Casino?",
query: "MATCH (m:Movie {{title: 'Casino'}})<-[:ACTED_IN]-(a) RETURN a.name",
},
{
question: "How many movies has Tom Hanks acted in?",
query:
"MATCH (a:Person {{name: 'Tom Hanks'}})-[:ACTED_IN]->(m:Movie) RETURN count(m)",
},
{
question: "List all the genres of the movie Schindler's List",
query:
"MATCH (m:Movie {{title: 'Schindler\\'s List'}})-[:IN_GENRE]->(g:Genre) RETURN g.name",
},
{
question:
"Which actors have worked in movies from both the comedy and action genres?",
query:
"MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name",
},
{
question:
"Which directors have made movies with at least three different actors named 'John'?",
query:
"MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name",
},
{
question: "Identify movies where directors also played a role in the film.",
query:
"MATCH (p:Person)-[:DIRECTED]->(m:Movie), (p)-[:ACTED_IN]->(m) RETURN m.title, p.name",
},
{
question:
"Find the actor with the highest number of movies in the database.",
query:
"MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1",
},
];
我们可以使用它们创建少样本提示,如下所示
import { FewShotPromptTemplate, PromptTemplate } from "@langchain/core/prompts";
const examplePrompt = PromptTemplate.fromTemplate(
"User input: {question}\nCypher query: {query}"
);
const prompt = new FewShotPromptTemplate({
examples: examples.slice(0, 5),
examplePrompt,
prefix:
"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\n\nHere is the schema information\n{schema}.\n\nBelow are a number of examples of questions and their corresponding Cypher queries.",
suffix: "User input: {question}\nCypher query: ",
inputVariables: ["question", "schema"],
});
console.log(
await prompt.format({
question: "How many artists are there?",
schema: "foo",
})
);
You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.
Here is the schema information
foo.
Below are a number of examples of questions and their corresponding Cypher queries.
User input: How many artists are there?
Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)
User input: Which actors played in the movie Casino?
Cypher query: MATCH (m:Movie {title: 'Casino'})<-[:ACTED_IN]-(a) RETURN a.name
User input: How many movies has Tom Hanks acted in?
Cypher query: MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)
User input: List all the genres of the movie Schindler's List
Cypher query: MATCH (m:Movie {title: 'Schindler\'s List'})-[:IN_GENRE]->(g:Genre) RETURN g.name
User input: Which actors have worked in movies from both the comedy and action genres?
Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name
User input: How many artists are there?
Cypher query:
动态少样本示例
如果我们有足够的示例,我们可能希望仅将最相关的示例包含在提示中,因为它们要么不适合模型的上下文窗口,要么因为示例的尾部会分散模型的注意力。具体而言,对于任何我们想要包含的输入,我们希望包含与该输入最相关的示例。
我们可以使用 ExampleSelector 来做到这一点。在这种情况下,我们将使用SemanticSimilarityExampleSelector,它将示例存储在我们选择的向量数据库中。在运行时,它将在输入和我们的示例之间执行相似度搜索,并返回语义上最相似的示例
import { OpenAIEmbeddings } from "@langchain/openai";
import { SemanticSimilarityExampleSelector } from "@langchain/core/example_selectors";
import { Neo4jVectorStore } from "@langchain/community/vectorstores/neo4j_vector";
const exampleSelector = await SemanticSimilarityExampleSelector.fromExamples(
examples,
new OpenAIEmbeddings(),
Neo4jVectorStore,
{
k: 5,
inputKeys: ["question"],
preDeleteCollection: true,
url,
username,
password,
}
);
await exampleSelector.selectExamples({
question: "how many artists are there?",
});
[
{
query: "MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)",
question: "How many artists are there?"
},
{
query: "MATCH (a:Person {{name: 'Tom Hanks'}})-[:ACTED_IN]->(m:Movie) RETURN count(m)",
question: "How many movies has Tom Hanks acted in?"
},
{
query: "MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE"... 84 more characters,
question: "Which actors have worked in movies from both the comedy and action genres?"
},
{
query: "MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH"... 71 more characters,
question: "Which directors have made movies with at least three different actors named 'John'?"
},
{
query: "MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DES"... 9 more characters,
question: "Find the actor with the highest number of movies in the database."
}
]
要使用它,我们可以将 ExampleSelector 直接传递到我们的 FewShotPromptTemplate 中
const prompt = new FewShotPromptTemplate({
exampleSelector,
examplePrompt,
prefix:
"You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.\n\nHere is the schema information\n{schema}.\n\nBelow are a number of examples of questions and their corresponding Cypher queries.",
suffix: "User input: {question}\nCypher query: ",
inputVariables: ["question", "schema"],
});
console.log(
await prompt.format({
question: "how many artists are there?",
schema: "foo",
})
);
You are a Neo4j expert. Given an input question, create a syntactically correct Cypher query to run.
Here is the schema information
foo.
Below are a number of examples of questions and their corresponding Cypher queries.
User input: How many artists are there?
Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie) RETURN count(DISTINCT a)
User input: How many movies has Tom Hanks acted in?
Cypher query: MATCH (a:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN count(m)
User input: Which actors have worked in movies from both the comedy and action genres?
Cypher query: MATCH (a:Person)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g1:Genre), (a)-[:ACTED_IN]->(:Movie)-[:IN_GENRE]->(g2:Genre) WHERE g1.name = 'Comedy' AND g2.name = 'Action' RETURN DISTINCT a.name
User input: Which directors have made movies with at least three different actors named 'John'?
Cypher query: MATCH (d:Person)-[:DIRECTED]->(m:Movie)<-[:ACTED_IN]-(a:Person) WHERE a.name STARTS WITH 'John' WITH d, COUNT(DISTINCT a) AS JohnsCount WHERE JohnsCount >= 3 RETURN d.name
User input: Find the actor with the highest number of movies in the database.
Cypher query: MATCH (a:Actor)-[:ACTED_IN]->(m:Movie) RETURN a.name, COUNT(m) AS movieCount ORDER BY movieCount DESC LIMIT 1
User input: how many artists are there?
Cypher query:
import { ChatOpenAI } from "@langchain/openai";
import { GraphCypherQAChain } from "langchain/chains/graph_qa/cypher";
const llm = new ChatOpenAI({
model: "gpt-3.5-turbo",
temperature: 0,
});
const chain = GraphCypherQAChain.fromLLM({
graph,
llm,
cypherPrompt: prompt,
});
await chain.invoke({
query: "How many actors are in the graph?",
});
{ result: "There are 967 actors in the graph." }