SlideShare a Scribd company logo
Comparing three data ingestion
approaches where Apache Kafka
integrates with a distributed
graph database in real time
Benyue (Emma) Liu
Product Manager, TigerGraph
April, 2021
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Today's Speakers
Benyue (Emma) Liu
Senior Product Manager, TigerGraph
● BS in Engineering from Harvey Mudd College, MS in
Engineering Systems from MIT
● Prior work experience at Oracle and MarkLogic
● Focus - Cloud, Containers, Enterprise Infra, Monitoring,
Management, Connectors, Developer Tools,
Applications
2
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
2
TigerGraph Data Ingestion System
Architecture: Internal Kakfa Component
TigerGraph Cloud and Kafka Data
Pipeline Use Cases
Built-in Kafka Loader in TigerGraph
Today’s Outline
4
3
3
1 Graph Analytics Overview
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Graph Analytics
Overview
4
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
5
By 2025, graph technologies will be used
in 80% of data and analytics innovations,
up from 10% in 2021, facilitating rapid
decision making across the enterprise.1
“To Graph or Not to Graph? That Is Not
the Question — You Will Graph.”2
Mark Beyer, Distinguished VP Analyst
1Gartner, Top Trends in Data and Analytics for 2021, 16 February 2021
2Gartner, Graph Steps Onto the Main Stage of Data and Analytics: A Gartner Trend Insight Report, 14, December 2020
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM |
Why Graph Analytics?
Source: Gartner - Top 10 data and analytics Trends for 2019
Graph deployments are going deeper, wider and operational:
Need to make it accessible to non-technical users
6
● Definition - Graph analytics is a set of analytic techniques that
allows for the exploration of relationships between entities of
interest such as organizations, people and transactions.
● Forecasted growth - 100% annually through 2022
● What’s driving the growth
○ Need to ask complex questions across complex data, which is
not always practical or even possible at scale using SQL
queries. (RDBMS requires time-consuming & expensive table
joins!)
● What’s needed for broad adoption of graph data stores
○ Graph data stores can efficiently model, explore and query data
with complex interrelationships across data silos, but the need for
specialized skills has limited their adoption to date.
Customer
Supplier
Location 2
Product
Payment
PURCHASED
Location 1
Order
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM
Who is TigerGraph?
Corporate Overview Video
We provide advanced analytics on connected data
○ The only scalable graph database for the enterprise
○ HTAP graph database, foundational for AI and ML solutions
○ SQL-like query language (GSQL) accelerates time to solution
○ Cloud Neutral: Google GCP, Microsoft Azure,
Our customers include:
○ The largest companies in financial, healthcare, telecoms, media,
utilities and innovative startups in cybersecurity, ecommerce and
finserv
Founded in 2012, HQ in Redwood City, California
7
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM
How Our Customers Use TigerGraph?
8
Find most influential
users/customers
Find similar users/customers Who are the patients that are going through a particular type of
journey that results in an adverse health outcome?
Is the
Uncover hidden connections Is the new credit card applicant or transaction connected to
known fraudsters?
Recommend next best action Can I run a real-time credit score algorithm and recommend an
offer based on the customer’s credit profile & need?
Which users are driving higher usage or adoption of my product or
service?
Detect connected users
(communities)
What is average spend over time across a community of connected
users (fin. services, airlines, healthcare, retail..)?
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
TigerGraph
System Architecture
Overview
(Kafka as a Key Component)
9
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
The TigerGraph Difference
Feature Design Difference Benefit
Real-Time Deep-Link Querying ● Native Graph design
● C++ engine, for high performance
● Storage Architecture
● Uncovers hard-to-find patterns
● Operational, real-time
● HTAP: Transactions+Analytics
Handling Massive Scale ● Distributed DB architecture
● Massively parallel processing
● Compressed storage reduces
footprint and messaging
● Integrates all your data
● Automatic partitioning
● Elastic scaling of resource usage
In-Database Analytics ● GSQL: High-level yet Turing-
complete language
● User-extensible graph algorithm
library, runs in-DB
● ACID (OLTP) and Accumulators
(OLAP)
● Avoids transferring data
● Richer graph context
● In-DB machine learning
5 to 10+ hops deep
10
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
TigerGraph Architecture - Kafka as a Key Component
11
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Data Ingestion Steps Inside of TigerGraph
12
Step 3
Each GPE consumes the
partial data updates,
processes it and puts it on
disk.
Loading Jobs and POST use
UPSERT semantics:
● If vertex/edge doesn't
yet exist, create it.
● If vertex/edge already
exists, update it.
Step 1
Data integration through the
following ways to ingest in
user source data.
● Bulk load of data files or
a Kafka stream in CSV or
JSON format
● HTTP POSTs via REST
services (JSON)
● GSQL Insert commands
Step 2
Dispatcher takes in the data
ingestion requests in the form of
updates to the database.
1. Query IDS to get internal
IDs
2. Convert data to internal
format
3. Send data to one or more
corresponding GPEs
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Data Ingestion (Internal)
13
Incremental
Data
Nginx Restpp
GPE GPE GPE
Disk Disk Disk
CSV/JSON Insert/Update/Delete
Vertices and Edges
Listen to
corresponding
topic for new
messages
Acknowledge
Response
Incoming
Outgoing
Synchronize
data to disk
GSE(IDS)
ID Translation
Kafka Kafka Kafka
Server 1 Server 2 Server 3
Kafka Cluster
In-memory
copy of data
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Kafka and TigerGraph
-Native Kafka Loader
14
External
Kafka
Cluster
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Kafka and TigerGraph Data Pipeline
Static
Data
Sources
Streaming
Data
Sources
Kafka
Loader
15
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Kafka Loader - Speed to Value from Real-time
Streaming Data
• Reduce Data Availability Gap and Accelerate Time to Value
• Native Integration with Real-time Streaming Data and Batch
Data
• Enables Real-time Graph Feature Updates with Streaming
Data in Machine Learning Use Cases
• Decrease Learning Curve With Familiar Syntax
• GSQL Support with Consistent Data Loading Syntax
• Maintain Separation of Control for Data Loading
• Designed with Built-in MultiGraph Support
16
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Kafka Loader : Three Steps
Consistent with GSQL Data Loading Steps
Step 1: Define the Data Source
Step 2: Create a Loading Job
Step 3: Run the Loading Job
17
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Kafka Loader High Level Architecture
● Connect to External Kafka Cluster
● User Commands Through GSQL Server
● Configuration Settings:
○ Config 1: Kakfa Cluster Configuration
○ Config 2: Topic/Partition/Offset Info
18
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
DEMO
19
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Kafka and tgcloud.io
Data Loading Use Cases
20
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
What is TigerGraph Cloud (tgcloud.io)?
● TigerGraph Cloud is a distributed graph Database-as-a-Service
● Cloud Infrastructure Included
● Out-of-box graph starter kits - industry use case library
● Complete operational support by TigerGraph
○ Upgrades
○ Patches
○ Maintenance
○ Status Monitoring
● Flexible and hourly billing rates by credit card and through prepaid bulk
“cloud credits”
● Multiple versions and multiple clusters supported
● Easy provisioning for distributed clusters
● Built in Encryption and Security
21
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
TigerGraph Cloud Architecture
TigerGraph Cloud Portal
GraphStudio Admin Portal
GSQL
Web Shell
TigerGraph
Database
+
Graph Starter Kit
GraphStudio Admin Portal
GSQL
Web Shell
TigerGraph
Database
+
Graph Starter Kit
…...
…...
22
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Azure Blob + Kafka + tgcloud.io
23
Kafka
Loader
Kafka Connect
Azure Blob Storage
Source Connector
Azure Blob Storage
https://docs.confluent.io/kafka-connect-azure-blob-storage-source/current/index.html
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Google Cloud Storage + Kafka + tgcloud.io
24
Kafka
Loader
Kafka Connect
Google Cloud
Storage Source
Connector
https://docs.confluent.io/kafka-connect-gcs-source/current/overview.html
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Amazon Kinesis + Kafka + tgcloud.io
25
Kafka
Loader
Kafka Connect
Amazon Kinesis
Source Connector
https://docs.confluent.io/kafka-connect-kinesis/current/index.html
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
DEMO
26
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Summary
27
© 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION |
Summary
● Graph Analytics Overview
● TigerGraph Data Ingestion System Architecture: Kafka as
an Internal Component
● Built-in Kafka Loader in TigerGraph
● TigerGraph Cloud (tgcloud.io) and Kafka Data Pipeline Use
Cases
28
Get Started for Free
● Try TigerGraph Cloud ( tgcloud.io )
● Download TigerGraph’s Developer Edition
● Take a Test Drive - Online Demo
● Get TigerGraph Certified
● Join the Community
@TigerGraphDB /tigergraph /TigerGraphDB /company/TigerGraph
29

More Related Content

What's hot (20)

PDF
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
 
PPTX
Apache Flink Deep Dive
DataWorks Summit
 
PPTX
Apache HBase™
Prashant Gupta
 
PDF
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
PDF
Moving to Databricks & Delta
Databricks
 
PDF
Intro to Delta Lake
Databricks
 
PPTX
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
PPTX
Intro to Neo4j
Neo4j
 
PDF
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Cathrine Wilhelmsen
 
PDF
Introduction to Graph Databases
DataStax
 
PDF
Introducing Neo4j
Neo4j
 
PDF
Spark with Delta Lake
Knoldus Inc.
 
PDF
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
PDF
Intro to Neo4j and Graph Databases
Neo4j
 
PDF
Modernizing to a Cloud Data Architecture
Databricks
 
PDF
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
PDF
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
PDF
Data Mesh
Piethein Strengholt
 
PDF
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Masahiko Sawada
 
PPTX
Modern Data Warehousing with the Microsoft Analytics Platform System
James Serra
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Anant Corporation
 
Apache Flink Deep Dive
DataWorks Summit
 
Apache HBase™
Prashant Gupta
 
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
Moving to Databricks & Delta
Databricks
 
Intro to Delta Lake
Databricks
 
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
Intro to Neo4j
Neo4j
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Cathrine Wilhelmsen
 
Introduction to Graph Databases
DataStax
 
Introducing Neo4j
Neo4j
 
Spark with Delta Lake
Knoldus Inc.
 
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
Intro to Neo4j and Graph Databases
Neo4j
 
Modernizing to a Cloud Data Architecture
Databricks
 
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Masahiko Sawada
 
Modern Data Warehousing with the Microsoft Analytics Platform System
James Serra
 

Similar to Comparing three data ingestion approaches where Apache Kafka integrates with a distributed graph database in real time | Benyue (Emma) Liu, TigerGraph (20)

PDF
Better Together: How Graph database enables easy data integration with Spark ...
TigerGraph
 
PDF
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
TigerGraph
 
PPTX
Tiger graph 2021 corporate overview [read only]
ercan5
 
PDF
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
TigerGraph
 
PDF
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
TigerGraph
 
PPTX
Graph Gurus Episode 35: No Code Graph Analytics to Get Insights from Petabyte...
TigerGraph
 
PDF
Graph Gurus 15: Introducing TigerGraph 2.4
TigerGraph
 
PDF
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Shift Conference
 
PDF
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
TigerGraph
 
PDF
Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3
TigerGraph
 
PDF
How Graph Databases used in Police Department?
Samet KILICTAS
 
PDF
Machine Learning Feature Design with TigerGraph 3.0 No-Code GUI
TigerGraph
 
PPTX
State of Florida Neo4J Graph Briefing - Keynote
Neo4j
 
PDF
Scaling up business value with real-time operational graph analytics
Connected Data World
 
PDF
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
TigerGraph
 
PDF
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
TigerGraph
 
PDF
Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5
TigerGraph
 
PDF
Graph Gurus Episode 22: Cybersecurity
TigerGraph
 
PDF
Real-Time Graph Analytics in Power BI.pdf
Jerod Johnson
 
PDF
Big Data
Mehmet Burak Akgün
 
Better Together: How Graph database enables easy data integration with Spark ...
TigerGraph
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
TigerGraph
 
Tiger graph 2021 corporate overview [read only]
ercan5
 
Graph Gurus Episode 12: Tiger Graph v2.3 Overview
TigerGraph
 
Graph Gurus Episode 25: Unleash the Business Value of Your Data Lake with Gra...
TigerGraph
 
Graph Gurus Episode 35: No Code Graph Analytics to Get Insights from Petabyte...
TigerGraph
 
Graph Gurus 15: Introducing TigerGraph 2.4
TigerGraph
 
Shift Remote: AI: Smarter AI with analytical graph databases - Victor Lee (Ti...
Shift Conference
 
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
TigerGraph
 
Graph Gurus Episode 29: Using Graph Algorithms for Advanced Analytics Part 3
TigerGraph
 
How Graph Databases used in Police Department?
Samet KILICTAS
 
Machine Learning Feature Design with TigerGraph 3.0 No-Code GUI
TigerGraph
 
State of Florida Neo4J Graph Briefing - Keynote
Neo4j
 
Scaling up business value with real-time operational graph analytics
Connected Data World
 
Graph Gurus Episode 26: Using Graph Algorithms for Advanced Analytics Part 1
TigerGraph
 
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
TigerGraph
 
Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5
TigerGraph
 
Graph Gurus Episode 22: Cybersecurity
TigerGraph
 
Real-Time Graph Analytics in Power BI.pdf
Jerod Johnson
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Digital Circuits, important subject in CS
contactparinay1
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 

Comparing three data ingestion approaches where Apache Kafka integrates with a distributed graph database in real time | Benyue (Emma) Liu, TigerGraph

  • 1. Comparing three data ingestion approaches where Apache Kafka integrates with a distributed graph database in real time Benyue (Emma) Liu Product Manager, TigerGraph April, 2021
  • 2. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Today's Speakers Benyue (Emma) Liu Senior Product Manager, TigerGraph ● BS in Engineering from Harvey Mudd College, MS in Engineering Systems from MIT ● Prior work experience at Oracle and MarkLogic ● Focus - Cloud, Containers, Enterprise Infra, Monitoring, Management, Connectors, Developer Tools, Applications 2
  • 3. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | 2 TigerGraph Data Ingestion System Architecture: Internal Kakfa Component TigerGraph Cloud and Kafka Data Pipeline Use Cases Built-in Kafka Loader in TigerGraph Today’s Outline 4 3 3 1 Graph Analytics Overview
  • 4. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Graph Analytics Overview 4
  • 5. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | 5 By 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% in 2021, facilitating rapid decision making across the enterprise.1 “To Graph or Not to Graph? That Is Not the Question — You Will Graph.”2 Mark Beyer, Distinguished VP Analyst 1Gartner, Top Trends in Data and Analytics for 2021, 16 February 2021 2Gartner, Graph Steps Onto the Main Stage of Data and Analytics: A Gartner Trend Insight Report, 14, December 2020
  • 6. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | Why Graph Analytics? Source: Gartner - Top 10 data and analytics Trends for 2019 Graph deployments are going deeper, wider and operational: Need to make it accessible to non-technical users 6 ● Definition - Graph analytics is a set of analytic techniques that allows for the exploration of relationships between entities of interest such as organizations, people and transactions. ● Forecasted growth - 100% annually through 2022 ● What’s driving the growth ○ Need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries. (RDBMS requires time-consuming & expensive table joins!) ● What’s needed for broad adoption of graph data stores ○ Graph data stores can efficiently model, explore and query data with complex interrelationships across data silos, but the need for specialized skills has limited their adoption to date. Customer Supplier Location 2 Product Payment PURCHASED Location 1 Order
  • 7. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM Who is TigerGraph? Corporate Overview Video We provide advanced analytics on connected data ○ The only scalable graph database for the enterprise ○ HTAP graph database, foundational for AI and ML solutions ○ SQL-like query language (GSQL) accelerates time to solution ○ Cloud Neutral: Google GCP, Microsoft Azure, Our customers include: ○ The largest companies in financial, healthcare, telecoms, media, utilities and innovative startups in cybersecurity, ecommerce and finserv Founded in 2012, HQ in Redwood City, California 7
  • 8. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM How Our Customers Use TigerGraph? 8 Find most influential users/customers Find similar users/customers Who are the patients that are going through a particular type of journey that results in an adverse health outcome? Is the Uncover hidden connections Is the new credit card applicant or transaction connected to known fraudsters? Recommend next best action Can I run a real-time credit score algorithm and recommend an offer based on the customer’s credit profile & need? Which users are driving higher usage or adoption of my product or service? Detect connected users (communities) What is average spend over time across a community of connected users (fin. services, airlines, healthcare, retail..)?
  • 9. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | TigerGraph System Architecture Overview (Kafka as a Key Component) 9
  • 10. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | The TigerGraph Difference Feature Design Difference Benefit Real-Time Deep-Link Querying ● Native Graph design ● C++ engine, for high performance ● Storage Architecture ● Uncovers hard-to-find patterns ● Operational, real-time ● HTAP: Transactions+Analytics Handling Massive Scale ● Distributed DB architecture ● Massively parallel processing ● Compressed storage reduces footprint and messaging ● Integrates all your data ● Automatic partitioning ● Elastic scaling of resource usage In-Database Analytics ● GSQL: High-level yet Turing- complete language ● User-extensible graph algorithm library, runs in-DB ● ACID (OLTP) and Accumulators (OLAP) ● Avoids transferring data ● Richer graph context ● In-DB machine learning 5 to 10+ hops deep 10
  • 11. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | TigerGraph Architecture - Kafka as a Key Component 11
  • 12. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Data Ingestion Steps Inside of TigerGraph 12 Step 3 Each GPE consumes the partial data updates, processes it and puts it on disk. Loading Jobs and POST use UPSERT semantics: ● If vertex/edge doesn't yet exist, create it. ● If vertex/edge already exists, update it. Step 1 Data integration through the following ways to ingest in user source data. ● Bulk load of data files or a Kafka stream in CSV or JSON format ● HTTP POSTs via REST services (JSON) ● GSQL Insert commands Step 2 Dispatcher takes in the data ingestion requests in the form of updates to the database. 1. Query IDS to get internal IDs 2. Convert data to internal format 3. Send data to one or more corresponding GPEs
  • 13. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Data Ingestion (Internal) 13 Incremental Data Nginx Restpp GPE GPE GPE Disk Disk Disk CSV/JSON Insert/Update/Delete Vertices and Edges Listen to corresponding topic for new messages Acknowledge Response Incoming Outgoing Synchronize data to disk GSE(IDS) ID Translation Kafka Kafka Kafka Server 1 Server 2 Server 3 Kafka Cluster In-memory copy of data
  • 14. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Kafka and TigerGraph -Native Kafka Loader 14 External Kafka Cluster
  • 15. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Kafka and TigerGraph Data Pipeline Static Data Sources Streaming Data Sources Kafka Loader 15
  • 16. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Kafka Loader - Speed to Value from Real-time Streaming Data • Reduce Data Availability Gap and Accelerate Time to Value • Native Integration with Real-time Streaming Data and Batch Data • Enables Real-time Graph Feature Updates with Streaming Data in Machine Learning Use Cases • Decrease Learning Curve With Familiar Syntax • GSQL Support with Consistent Data Loading Syntax • Maintain Separation of Control for Data Loading • Designed with Built-in MultiGraph Support 16
  • 17. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Kafka Loader : Three Steps Consistent with GSQL Data Loading Steps Step 1: Define the Data Source Step 2: Create a Loading Job Step 3: Run the Loading Job 17
  • 18. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Kafka Loader High Level Architecture ● Connect to External Kafka Cluster ● User Commands Through GSQL Server ● Configuration Settings: ○ Config 1: Kakfa Cluster Configuration ○ Config 2: Topic/Partition/Offset Info 18
  • 19. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | DEMO 19
  • 20. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Kafka and tgcloud.io Data Loading Use Cases 20
  • 21. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | What is TigerGraph Cloud (tgcloud.io)? ● TigerGraph Cloud is a distributed graph Database-as-a-Service ● Cloud Infrastructure Included ● Out-of-box graph starter kits - industry use case library ● Complete operational support by TigerGraph ○ Upgrades ○ Patches ○ Maintenance ○ Status Monitoring ● Flexible and hourly billing rates by credit card and through prepaid bulk “cloud credits” ● Multiple versions and multiple clusters supported ● Easy provisioning for distributed clusters ● Built in Encryption and Security 21
  • 22. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | TigerGraph Cloud Architecture TigerGraph Cloud Portal GraphStudio Admin Portal GSQL Web Shell TigerGraph Database + Graph Starter Kit GraphStudio Admin Portal GSQL Web Shell TigerGraph Database + Graph Starter Kit …... …... 22
  • 23. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Azure Blob + Kafka + tgcloud.io 23 Kafka Loader Kafka Connect Azure Blob Storage Source Connector Azure Blob Storage https://docs.confluent.io/kafka-connect-azure-blob-storage-source/current/index.html
  • 24. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Google Cloud Storage + Kafka + tgcloud.io 24 Kafka Loader Kafka Connect Google Cloud Storage Source Connector https://docs.confluent.io/kafka-connect-gcs-source/current/overview.html
  • 25. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Amazon Kinesis + Kafka + tgcloud.io 25 Kafka Loader Kafka Connect Amazon Kinesis Source Connector https://docs.confluent.io/kafka-connect-kinesis/current/index.html
  • 26. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | DEMO 26
  • 27. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Summary 27
  • 28. © 2021. ALL RIGHTS RESERVED. | TIGERGRAPH.COM | CONFIDENTIAL INFORMATION | Summary ● Graph Analytics Overview ● TigerGraph Data Ingestion System Architecture: Kafka as an Internal Component ● Built-in Kafka Loader in TigerGraph ● TigerGraph Cloud (tgcloud.io) and Kafka Data Pipeline Use Cases 28
  • 29. Get Started for Free ● Try TigerGraph Cloud ( tgcloud.io ) ● Download TigerGraph’s Developer Edition ● Take a Test Drive - Online Demo ● Get TigerGraph Certified ● Join the Community @TigerGraphDB /tigergraph /TigerGraphDB /company/TigerGraph 29