SlideShare a Scribd company logo
AgensGraph: a Multi-Model Graph Database
based-on PostgreSQL
Kisung Kim (kskim@bitnine.net)
Bitnine R&D Center
2017-1-14
Who am I
• Ph.D Kisung Kim -Chief Technology Officer of Bitnine Global Inc.
• Researched query optimization for graph-structured data during
doctorate degree
• Developed a distributed relational database engine in TmaxSoft
• Lead the development of a new graph database, AgensGraph in
Bitnine Global
What is Graph Database?
Images from http://www.slideshare.net/debanjanmahata/an-introduction-to-nosql-graph-databases-and-neo4j
What is Graph Database?
• Relationship is the first-class citizen in the graph database
• Make your data connected in the graph database
Relational Database Graph Database
Entity Row Node (Vertex)
Relationship Row Relationship (Edge)
What is the Graph Database?
• Handle data in different view
• Data model similar to entity-relationship model
• Gartner says it represents a radical change in how data is
organized and processed
Cypher Query Language
• Declarative query language for the property graph model
• Inspired by SQL and SPARQL
– Designed to be human-readable query language
• Developed by Neo technology Inc. since 2011
• Current version is 3.0
• OpenCypher.org (http://opencypher.org)
– Participate in developing the query language
Cypher Query Example
Make two nodes
CREATE (:person {id: 1, name: “Kisung Kim”, birthday: 1980-01-05});
CREATE (:company {id: 1, name: “Bitnine Global”});
Make a relationship between the two nodes
MATCH (p:person {id: 1}), (c:company {id:1})
CREATE (p)-[:workFor {title: “CTO”, since: 2014}]->(c);
Kisung Kim Bitnine Global
workFor
Cypher Query Example
Querying
MATCH (p:person {name: “Kisung Kim”})-[:workFor]->(c:company)
RETURN (p), (c)
No Table Definitions and No Joins
Query with variable length relationships
MATCH (p:person {name: “Kisung Kim”})-[:knows*..3]->(f:person)
RETURN (f)
Kisung Kim ?
workFor
Kisung Kim ?
knows
?
knows
?
knows
GraphDB to PostgreSQL Case
• From Hipolabs
http://engineering.hipolabs.com/graphdb-to-postgresql/
Graph Database and Hybrid Database
Magic Quadrant for Operational Database Management Systems, Gartner, 2016
So, What We Want to Make is
• Hybrid database engine with graph and relational model
• Cypher query processing on PostgreSQL
• Online transactional graph database
• Disk-based persistent graph storage
( ) -[:processes]->(Cypher)
Why We Choose PostgreSQL?
• Fully-featured enterprise-ready open source database
• Graph processing actually uses relational algebra
– Graph is serialized as tables in disk
– Every graph traversal step is in principle a join
(from LDBC documentation)
• It is important to optimize the joins speed up join processing
– PostgreSQL has an excellent query optimizer
• And…. Abundant eco-system of PostgreSQL
Challenges
• How to store graph data
– Efficient structure for graph pattern matching
– At the same time, efficient for transaction processing
• How to process graph queries
– Processing complex graph pattern matching: variable length path,
shortest path
– Mismatches between graph data model & relational data model
– Graph query optimization
Graph Storage
• Graph data is stored in disk as decomposed into vertexes
and edges
• When processing graph pattern matching, it is essential to
find adjacent vertexes or edges efficiently
– Given a start vertex, find end vertexes
– Given an end vertex, find start vertexes
v1
Two Graph Databases
Solution Company Latest Version Features
Neo Technology 3.1
Most famous graph database, Cypher
O(1) access using fixed-size array
Datastax -
Distributed graph system based on
Cassandra
Titan
Graph Storage -Neo4j
• Fixed-size array for nodes and relationships
• Relationships for a node is organized as a doubly-linked list
• Index-free adjacency
• O(1) access for adjacent edges: follow the pointer
From Graph Databases 2nd ed. O’Reilly, 2015
Graph Storage – Titan (DSE Graph)
• Titan stores graphs in adjacency list format
• Each edge is stored twice
• Vertex and edge list are stored in backend storage like HBase
Cassandra or BerkeleyDB
From http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html
Graph Storage -AgensGraph
• Fixed-size array is hard to implement in PostgreSQL
– Tuples are moved when updated
• Titan’s big row approach is also inadequate
• We chose B-tree index for graph traversal
Graph
Vertex Edge
Vertex ID Properties Edge ID PropertiesStart Vertex ID End Vertex ID
B-tree
Vertex ID
B-tree
(Start, End)
B-tree
(End, Start)
Index Problems
• Current B-tree has several disadvantages for our workload
– Composite index is preferable but the size increases
– There exists a lot of duplicate keys (vertex ID)on start_ID or end_ID
– Property updates incur insertions into B-trees
• We are developing a new index having bucket structure (like
GIN index), in-direct index and supports for index-only scan
for the graph traversals
Graph Storage -AgensGraph
• Vertexes and edges are grouped into labels
• Labels are organized as a label hierarchy
• We use PostgreSQL’s table hierarchy feature
Vertex ID Properties
ag_vertex
Vertex ID Properties
Person
Vertex ID Properties
Message
Vertex ID Properties
Comment
Vertex ID Properties
Post
Current Status
• AgensGraph v0.9
(https://github.com/bitnine-oss/agens-graph or http://bitnine.net/downloads/)
– Graph data model and DDL on PostgreSQL 9.6
– Cypher query processing (70% of OpenCypher spec.)
– Integrated query processing (Cypher + SQL)
– Client library (JDBC, ODBC, Python)
– Monitoring and development using Tadpole DB-hub
Tadpole for Agens Graph
• Tadpole DB Hub is open-source project for managing unified
infrastructure (https://github.com/hangum/TadpoleForDBTools)
• Support various databases including (PostgreSQL and Agens Graph)
• Features of Tadpole for Agens Graph
– Monitoring Agens Graph server
– Cypher query browser and graph visualization
Tadpole for AgensGraph
Future Roadmap
• Distributed graph database
– Plan to exploit Postgres-XL
• Specialized storage and index for graph traversals
• Dictionary compression for JSONB (ZSON)
• Graph query optimization using graph statistics
• Integration with big data systems
– HDFS Storage
– Graph analysis using GraphX
Join Us
• AgensGraph is an open-source project https://github.com/bitnine-oss/agens-
graph
• We also wish to contribute PostgreSQL community
• Graph database meetup in Silicon Valley
– http://www.meetup.com/Graph-Database-in-Silicon-Valley/
Thank You
kskim@bitinine.net
:likes

More Related Content

What's hot (20)

PDF
Neo4j in Depth
Max De Marzi
 
PDF
An overview of Neo4j Internals
Tobias Lindaaker
 
PPTX
Data Engineering and the Data Science Lifecycle
Adam Doyle
 
PPT
Graph database
Shruti Arya
 
PDF
GeoServer on steroids
GeoSolutions
 
PPTX
NoSQL Graph Databases - Why, When and Where
Eugene Hanikblum
 
PDF
Visualising Multi Dimensional Data
Amit Kapoor
 
PDF
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 
PPTX
Intro to Neo4j
Neo4j
 
PPTX
Data Visualization
Mithilesh Trivedi
 
PDF
Data Analaytics.04. Data visualization
Alex Rayón Jerez
 
PDF
3. Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
PPTX
Modern Data Warehousing with the Microsoft Analytics Platform System
James Serra
 
PDF
Scarlet SmallTalk
ESUG
 
PPS
Digitisation Overview
Ria Groenewald
 
PDF
Summary introduction to data engineering
Novita Sari
 
PDF
Intro to Neo4j and Graph Databases
Neo4j
 
PDF
Natural Language Processing with Graph Databases and Neo4j
William Lyon
 
PDF
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DATAVERSITY
 
PDF
Data visualization in a Nutshell
WingChan46
 
Neo4j in Depth
Max De Marzi
 
An overview of Neo4j Internals
Tobias Lindaaker
 
Data Engineering and the Data Science Lifecycle
Adam Doyle
 
Graph database
Shruti Arya
 
GeoServer on steroids
GeoSolutions
 
NoSQL Graph Databases - Why, When and Where
Eugene Hanikblum
 
Visualising Multi Dimensional Data
Amit Kapoor
 
Five Things to Consider About Data Mesh and Data Governance
DATAVERSITY
 
Intro to Neo4j
Neo4j
 
Data Visualization
Mithilesh Trivedi
 
Data Analaytics.04. Data visualization
Alex Rayón Jerez
 
3. Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
Modern Data Warehousing with the Microsoft Analytics Platform System
James Serra
 
Scarlet SmallTalk
ESUG
 
Digitisation Overview
Ria Groenewald
 
Summary introduction to data engineering
Novita Sari
 
Intro to Neo4j and Graph Databases
Neo4j
 
Natural Language Processing with Graph Databases and Neo4j
William Lyon
 
DAS Slides: Metadata Management From Technical Architecture & Business Techni...
DATAVERSITY
 
Data visualization in a Nutshell
WingChan46
 

Similar to AgensGraph: a Multi-model Graph Database based on PostgreSql (20)

PDF
AgensGraph Presentation at PGConf.us 2017
Kisung Kim
 
PDF
Graph database in sv meetup
Joshua Bae
 
PDF
Graph Database Use Cases - StampedeCon 2015
StampedeCon
 
PDF
The Vision for Graph Database from Postgres
EDB
 
PPTX
GraphDatabase.pptx
JeyaVarthini1
 
PDF
GraphTech Ecosystem - part 1: Graph Databases
Linkurious
 
PDF
Gerry McNicol Graph Databases
Gerry McNicol
 
PPTX
Graph databases: Tinkerpop and Titan DB
Mohamed Taher Alrefaie
 
PDF
PGQL: A Language for Graphs
Jean Ihm
 
PDF
Cio summit 20170223_v20
Joshua Bae
 
PPTX
Graph Databases in the Microsoft Ecosystem
Marco Parenzan
 
PDF
A Survey on Graph Database Management Techniques for Huge Unstructured Data
IJECEIAES
 
PDF
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Neo4j
 
PDF
How Graph Databases used in Police Department?
Samet KILICTAS
 
PDF
Propel your Performance: AgensGraph, the multi-model database
Joshua Bae
 
PDF
How Graph Databases efficiently store, manage and query connected data at s...
jexp
 
PPT
Graph Analytics for big data
Sigmoid
 
PPTX
Follow the money with graphs
Stanka Dalekova
 
PPTX
GraphDB
Ömer Taşkın
 
PDF
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Connected Data World
 
AgensGraph Presentation at PGConf.us 2017
Kisung Kim
 
Graph database in sv meetup
Joshua Bae
 
Graph Database Use Cases - StampedeCon 2015
StampedeCon
 
The Vision for Graph Database from Postgres
EDB
 
GraphDatabase.pptx
JeyaVarthini1
 
GraphTech Ecosystem - part 1: Graph Databases
Linkurious
 
Gerry McNicol Graph Databases
Gerry McNicol
 
Graph databases: Tinkerpop and Titan DB
Mohamed Taher Alrefaie
 
PGQL: A Language for Graphs
Jean Ihm
 
Cio summit 20170223_v20
Joshua Bae
 
Graph Databases in the Microsoft Ecosystem
Marco Parenzan
 
A Survey on Graph Database Management Techniques for Huge Unstructured Data
IJECEIAES
 
Graphs & Big Data - Philip Rathle and Andreas Kollegger @ Big Data Science Me...
Neo4j
 
How Graph Databases used in Police Department?
Samet KILICTAS
 
Propel your Performance: AgensGraph, the multi-model database
Joshua Bae
 
How Graph Databases efficiently store, manage and query connected data at s...
jexp
 
Graph Analytics for big data
Sigmoid
 
Follow the money with graphs
Stanka Dalekova
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Connected Data World
 
Ad

Recently uploaded (20)

PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Ad

AgensGraph: a Multi-model Graph Database based on PostgreSql

  • 1. AgensGraph: a Multi-Model Graph Database based-on PostgreSQL Kisung Kim ([email protected]) Bitnine R&D Center 2017-1-14
  • 2. Who am I • Ph.D Kisung Kim -Chief Technology Officer of Bitnine Global Inc. • Researched query optimization for graph-structured data during doctorate degree • Developed a distributed relational database engine in TmaxSoft • Lead the development of a new graph database, AgensGraph in Bitnine Global
  • 3. What is Graph Database? Images from http://www.slideshare.net/debanjanmahata/an-introduction-to-nosql-graph-databases-and-neo4j
  • 4. What is Graph Database? • Relationship is the first-class citizen in the graph database • Make your data connected in the graph database Relational Database Graph Database Entity Row Node (Vertex) Relationship Row Relationship (Edge)
  • 5. What is the Graph Database? • Handle data in different view • Data model similar to entity-relationship model • Gartner says it represents a radical change in how data is organized and processed
  • 6. Cypher Query Language • Declarative query language for the property graph model • Inspired by SQL and SPARQL – Designed to be human-readable query language • Developed by Neo technology Inc. since 2011 • Current version is 3.0 • OpenCypher.org (http://opencypher.org) – Participate in developing the query language
  • 7. Cypher Query Example Make two nodes CREATE (:person {id: 1, name: “Kisung Kim”, birthday: 1980-01-05}); CREATE (:company {id: 1, name: “Bitnine Global”}); Make a relationship between the two nodes MATCH (p:person {id: 1}), (c:company {id:1}) CREATE (p)-[:workFor {title: “CTO”, since: 2014}]->(c); Kisung Kim Bitnine Global workFor
  • 8. Cypher Query Example Querying MATCH (p:person {name: “Kisung Kim”})-[:workFor]->(c:company) RETURN (p), (c) No Table Definitions and No Joins Query with variable length relationships MATCH (p:person {name: “Kisung Kim”})-[:knows*..3]->(f:person) RETURN (f) Kisung Kim ? workFor Kisung Kim ? knows ? knows ? knows
  • 9. GraphDB to PostgreSQL Case • From Hipolabs http://engineering.hipolabs.com/graphdb-to-postgresql/
  • 10. Graph Database and Hybrid Database Magic Quadrant for Operational Database Management Systems, Gartner, 2016
  • 11. So, What We Want to Make is • Hybrid database engine with graph and relational model • Cypher query processing on PostgreSQL • Online transactional graph database • Disk-based persistent graph storage ( ) -[:processes]->(Cypher)
  • 12. Why We Choose PostgreSQL? • Fully-featured enterprise-ready open source database • Graph processing actually uses relational algebra – Graph is serialized as tables in disk – Every graph traversal step is in principle a join (from LDBC documentation) • It is important to optimize the joins speed up join processing – PostgreSQL has an excellent query optimizer • And…. Abundant eco-system of PostgreSQL
  • 13. Challenges • How to store graph data – Efficient structure for graph pattern matching – At the same time, efficient for transaction processing • How to process graph queries – Processing complex graph pattern matching: variable length path, shortest path – Mismatches between graph data model & relational data model – Graph query optimization
  • 14. Graph Storage • Graph data is stored in disk as decomposed into vertexes and edges • When processing graph pattern matching, it is essential to find adjacent vertexes or edges efficiently – Given a start vertex, find end vertexes – Given an end vertex, find start vertexes v1
  • 15. Two Graph Databases Solution Company Latest Version Features Neo Technology 3.1 Most famous graph database, Cypher O(1) access using fixed-size array Datastax - Distributed graph system based on Cassandra Titan
  • 16. Graph Storage -Neo4j • Fixed-size array for nodes and relationships • Relationships for a node is organized as a doubly-linked list • Index-free adjacency • O(1) access for adjacent edges: follow the pointer From Graph Databases 2nd ed. O’Reilly, 2015
  • 17. Graph Storage – Titan (DSE Graph) • Titan stores graphs in adjacency list format • Each edge is stored twice • Vertex and edge list are stored in backend storage like HBase Cassandra or BerkeleyDB From http://s3.thinkaurelius.com/docs/titan/1.0.0/data-model.html
  • 18. Graph Storage -AgensGraph • Fixed-size array is hard to implement in PostgreSQL – Tuples are moved when updated • Titan’s big row approach is also inadequate • We chose B-tree index for graph traversal Graph Vertex Edge Vertex ID Properties Edge ID PropertiesStart Vertex ID End Vertex ID B-tree Vertex ID B-tree (Start, End) B-tree (End, Start)
  • 19. Index Problems • Current B-tree has several disadvantages for our workload – Composite index is preferable but the size increases – There exists a lot of duplicate keys (vertex ID)on start_ID or end_ID – Property updates incur insertions into B-trees • We are developing a new index having bucket structure (like GIN index), in-direct index and supports for index-only scan for the graph traversals
  • 20. Graph Storage -AgensGraph • Vertexes and edges are grouped into labels • Labels are organized as a label hierarchy • We use PostgreSQL’s table hierarchy feature Vertex ID Properties ag_vertex Vertex ID Properties Person Vertex ID Properties Message Vertex ID Properties Comment Vertex ID Properties Post
  • 21. Current Status • AgensGraph v0.9 (https://github.com/bitnine-oss/agens-graph or http://bitnine.net/downloads/) – Graph data model and DDL on PostgreSQL 9.6 – Cypher query processing (70% of OpenCypher spec.) – Integrated query processing (Cypher + SQL) – Client library (JDBC, ODBC, Python) – Monitoring and development using Tadpole DB-hub
  • 22. Tadpole for Agens Graph • Tadpole DB Hub is open-source project for managing unified infrastructure (https://github.com/hangum/TadpoleForDBTools) • Support various databases including (PostgreSQL and Agens Graph) • Features of Tadpole for Agens Graph – Monitoring Agens Graph server – Cypher query browser and graph visualization
  • 24. Future Roadmap • Distributed graph database – Plan to exploit Postgres-XL • Specialized storage and index for graph traversals • Dictionary compression for JSONB (ZSON) • Graph query optimization using graph statistics • Integration with big data systems – HDFS Storage – Graph analysis using GraphX
  • 25. Join Us • AgensGraph is an open-source project https://github.com/bitnine-oss/agens- graph • We also wish to contribute PostgreSQL community • Graph database meetup in Silicon Valley – http://www.meetup.com/Graph-Database-in-Silicon-Valley/