1.4亿中文知识图谱导入Nebula Graph快速体验

1. 史上最大规模的中文知识图谱

Yener 开源了史上最大规模的中文知识图谱—— OwnThink（链接：https://github.com/ownthink/KnowledgeGraphData，数据量为 1.4 亿条。数据以 (实体, 属性, 值) 和 (实体, 关系, 实体) 混合的三元组形式存储，数据格式为 csv

在这里插入图片描述

2. 重复数据清洗

你可以在这里https://github.com/jievince/rdf-converter下载这个简单的清洗工具源代码并编译使用。该工具会把转换后的顶点的数据写入到 vertex.csv 文件、边数据写入到 edge.csv 文件。在测试过程中，发现有大量的重复点数据，所以工具里面也做了去重。完全去重后的点的数据大概是 4600 万条，完全去重后的边的数据大概是 1 亿 4000 万条。

也可以直接下载去重后的数据https://www.kaggle.com/datasets/littlewey/nebula-ownthink-property-graph

在这里插入图片描述

3. 准备 schema 等元数据

create space 的概念接近 MySQL 里面 create database

# 创建 test space
CREATE SPACE test(partition_num=20,replica_factor=1,vid_type=INT64);
# 进入 test space
USE test;
# 创建点类型（entity）
CREATE TAG entity(name string);
# 创建边类型 (relation) 
CREATE EDGE relation(name string);
# 查看 entity 标签的属性
DESCRIBE TAG entity;

在这里插入图片描述

4. nebula-importer 导入数据

https://github.com/vesoft-inc/nebula-importer/releases 下载导入工具

直接使用如下config.yaml, 语法参考github相关文档

client:version: v3address: "127.0.0.1:9669"user: rootpassword: nebulaconcurrencyPerAddress: 10reconnectInitialInterval: 1sretry: 3retryInitialInterval: 1smanager:spaceName: testbatch: 128readerConcurrency: 50importerConcurrency: 512statsInterval: 10s
log:level: INFOconsole: truefiles:- logs/nebula-importer.logsources:- path: ./vertex.csvfailDataPath: ./err/vertex.csvcsv:delimiter: ","withHeader: falsewithLabel: falsetags:- name: entityid:type: "INT"index: 0props:- name: "name"type: "STRING"index: 1- path: ./edge.csvfailDataPath: ./err/edge.csvbatch: 256csv:delimiter: ","withHeader: falsewithLabel: falseedges:- name: relationsrc:id:type: "INT"index: 0dst:id:type: "INT"index: 1props:- name: "name"type: "string"index: 2

nebula-importer -c config.yaml

等待导入即可
在这里插入图片描述

5. 查询初体验

5.1 与姚明有直接关联的边的类型和点的属性

GO FROM hash("姚明[中国篮球协会主席、中职联公司董事长]") OVER relation YIELD relation.name AS Name, $$.entity.name AS Value;

在这里插入图片描述

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.rhkb.cn/news/390640.html

如若内容造成侵权/违法违规/事实不符，请联系长河编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！