如何构建知识图谱
在本指南中,我们将介绍构建基于非结构化文本的知识图谱的基本方法。构建的图谱可以用作 RAG 应用程序中的知识库。从高层次上讲,从文本构建知识图谱的步骤是
- 从文本中提取结构化信息:模型用于从文本中提取结构化的图谱信息。
- 存储到图数据库中:将提取的结构化的图谱信息存储到图数据库中,可以支持下游 RAG 应用程序
设置
安装依赖项
提示
- npm
- yarn
- pnpm
npm i langchain @langchain/community @langchain/openai @langchain/core neo4j-driver zod
yarn add langchain @langchain/community @langchain/openai @langchain/core neo4j-driver zod
pnpm add langchain @langchain/community @langchain/openai @langchain/core neo4j-driver zod
设置环境变量
我们将在本示例中使用 OpenAI
OPENAI_API_KEY=your-api-key
# Optional, use LangSmith for best-in-class observability
LANGSMITH_API_KEY=your-api-key
LANGCHAIN_TRACING_V2=true
# Reduce tracing latency if you are not in a serverless environment
# LANGCHAIN_CALLBACKS_BACKGROUND=true
接下来,我们需要定义 Neo4j 凭据。请按照 这些安装步骤 设置 Neo4j 数据库。
NEO4J_URI="bolt://localhost:7687"
NEO4J_USERNAME="neo4j"
NEO4J_PASSWORD="password"
以下示例将创建一个与 Neo4j 数据库的连接。
import "neo4j-driver";
import { Neo4jGraph } from "@langchain/community/graphs/neo4j_graph";
const url = process.env.NEO4J_URI;
const username = process.env.NEO4J_USER;
const password = process.env.NEO4J_PASSWORD;
const graph = await Neo4jGraph.initialize({ url, username, password });
LLM 图谱转换器
从文本中提取图谱数据可以将非结构化信息转换为结构化格式,从而促进更深入的洞察,并更有效地浏览复杂的关系和模式。LLMGraphTransformer 通过利用 LLM 解析和分类实体及其关系,将文本文档转换为结构化的图谱文档。LLM 模型的选择会通过确定提取的图谱数据的准确性和细微差别来显著影响输出。
import { ChatOpenAI } from "@langchain/openai";
import { LLMGraphTransformer } from "@langchain/community/experimental/graph_transformers/llm";
const model = new ChatOpenAI({
temperature: 0,
model: "gpt-4-turbo-preview",
});
const llmGraphTransformer = new LLMGraphTransformer({
llm: model,
});
现在我们可以传递示例文本并检查结果。
import { Document } from "@langchain/core/documents";
let text = `
Marie Curie, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
`;
const result = await llmGraphTransformer.convertToGraphDocuments([
new Document({ pageContent: text }),
]);
console.log(`Nodes: ${result[0].nodes.length}`);
console.log(`Relationships:${result[0].relationships.length}`);
Nodes: 8
Relationships:7
请注意,由于我们使用的是 LLM,因此图谱构建过程是非确定性的。因此,每次执行时您可能会得到略微不同的结果。查看以下图像以更好地理解生成的知识图谱的结构。
此外,您还可以根据自己的要求灵活地定义用于提取的特定节点类型和关系类型。
const llmGraphTransformerFiltered = new LLMGraphTransformer({
llm: model,
allowedNodes: ["PERSON", "COUNTRY", "ORGANIZATION"],
allowedRelationships: ["NATIONALITY", "LOCATED_IN", "WORKED_AT", "SPOUSE"],
strictMode: false,
});
const result_filtered =
await llmGraphTransformerFiltered.convertToGraphDocuments([
new Document({ pageContent: text }),
]);
console.log(`Nodes: ${result_filtered[0].nodes.length}`);
console.log(`Relationships:${result_filtered[0].relationships.length}`);
Nodes: 6
Relationships:4
为了更好地了解生成的图谱,我们可以再次对其进行可视化。
存储到图数据库中
生成的图谱文档可以使用 addGraphDocuments
方法存储到图数据库中。
await graph.addGraphDocuments(result_filtered);