SlideShare a Scribd company logo
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
Master Real-Time Streams With
Neo4j and Apache Kafka
Andrea Santurbano,
CTO @ LARUS Business Automation srl
Mauro Roiter,
Data Engineer @ LARUS Business Automation srl
© 2022 Neo4j, Inc. All rights reserved.
2
Who are we?
Andrea Santurbano
@santand84
[:CTO_AT]
[:LOVES]
[:IS_PREMIER_PARTNER]
© 2022 Neo4j, Inc. All rights reserved.
3
About Larus
© 2022 Neo4j, Inc. All rights reserved.
4
LARUS Business Automation
Big Data Platform Design &
Development (Java, Scala,
Python, Javascript)
Data Engineering
Graph Data Visualization
Data Science
We help companies to
become insight-driven
organizations
Machine Learning and AI
graph based technology
© 2022 Neo4j, Inc. All rights reserved.
5
2016
Neo4j JDBC Driver
2015
2011
First Spikes
in Retail for
Articles’
Clustering
2014 2018
Neo4j APOC, ETL,
GraphQL, Zeppelin
2019
Neo4j Kafka Connector
2020
PREMIER Partner
AI Based on Neo4j
Neo4j Spark Connector
© 2022 Neo4j, Inc. All rights reserved.
6
Agenda
© 2022 Neo4j, Inc. All rights reserved.
Agenda
• Apache Kafka
• Kafka Connect
• Kafka Connect Neo4j Connector overview
• Source Connector
• Sink Connector
• How deal with bad data
• Multi database support
• Pros & Cons
• Architectural considerations
• Conclusions
• Q&A
7
© 2022 Neo4j, Inc. All rights reserved.
8
Apache Kafka
© 2022 Neo4j, Inc. All rights reserved.
9
Apache Kafka
“Apache Kafka is an open-source distributed event
streaming platform for high-performance data pipelines,
streaming analytics, data integration, and mission-critical
applications” – Apache Kafka
© 2022 Neo4j, Inc. All rights reserved.
10
Kafka Connect
© 2022 Neo4j, Inc. All rights reserved.
11
Kafka Connect
• Is a component of Apache Kafka for
connecting Kafka with external systems
such as databases, key-value stores,
search indexes, and others.
• Connectors, tasks, workers
• It can be run in two mode: standalone or
distributed
• 2 types of connectors: source or sink
• Designed to be scalable and fault
tolerant
KAFKA CONNECT
KAFKA CONNECT
© 2022 Neo4j, Inc. All rights reserved.
12
Kafka Connect - REST API
To install/uninstall connector we have to use the “/connector” REST endpoint
which can be invoked with the following HTTP methods:
• POST: create the given connector
• PUT: update the given connector
• GET: return the list of active connectors
• DELETE: delete the given connector
© 2022 Neo4j, Inc. All rights reserved.
13
Kafka Connect Neo4j Connector
Overview
© 2022 Neo4j, Inc. All rights reserved.
14
1. Confluent Hub
2. GitHub repository
Where to find it
© 2022 Neo4j, Inc. All rights reserved.
15
Kafka Neo4j Connector
Sink
Source
15
© 2022 Neo4j, Inc. All rights reserved.
16
Neo4j Source Connector
© 2022 Neo4j, Inc. All rights reserved.
It works similarly to the JDBC Source connector
• It will periodically execute a Cypher query
• A timestamp field will be used to keep track of the the data that has been
modified since the last execution
• It does not use the Change Data Capture pattern
17
Neo4j Source Connector
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
18
WORKAROUND
WARNING
• Use a unique field together
with the timestamp field in
order to write a more selective
query
• SMT functions to manage
possible event’s duplication
• Timestamps aren’t
necessarily unique
• This mode can’t guarantee
that all updated data will be
delivered
© 2022 Neo4j, Inc. All rights reserved.
19
Neo4j Source Connector - How to setup
curl -X POST http://localhost:8083/connectors -H 'Content-Type:application/json' -H
'Accept:application/json' -d @<json-config-file-name>
JSON REST API
© 2022 Neo4j, Inc. All rights reserved.
20
JSON configuration
• neo4j.streaming.property
• neo4j.streaming.from
• neo4j.source.query
• neo4j.streaming.poll.interval.msecs
• $lastCheck?? It is a parameter
calculated based on the
“neo4j.streaming.from” value
© 2022 Neo4j, Inc. All rights reserved.
21
Neo4j Source Connector - How it works
• neo4j.streaming.poll.interval.msecs: polling frequency
• neo4j.source.query: data extraction query
• neo4j.streaming.from:
◦ ALL, from the beginning (the entire database)
◦ LAST_COMMITTED, will try to retrieve already committed offset. If any
committed offset was found then it will use NOW as fallback
◦ NOW, default, from the current time
© 2022 Neo4j, Inc. All rights reserved.
22
Neo4j Source Connector - How it works
Hard deletes are not supported!!!
© 2022 Neo4j, Inc. All rights reserved.
23
Startup workshop environment
Docker environment composed by the following containers:
• Confluent Kafka
• Confluent Zookeeper
• Confluent Kafka Connect + Neo4j Connector
• Neo4j + Neo4j Streams
• Apache Zeppelin
Just run:
docker-compose up -d
Then open your browser and connect to Zeppelin home page at:
http://localhost:8080
© 2022 Neo4j, Inc. All rights reserved.
24
DEMO
© 2022 Neo4j, Inc. All rights reserved.
25
Neo4j Sink Connector
© 2022 Neo4j, Inc. All rights reserved.
26
Neo4j Sink Connector - How to setup
curl -X POST http://localhost:8083/connectors -H 'Content-Type:application/json' -H
'Accept:application/json' -d @<json-config-file-name>
JSON REST API
© 2022 Neo4j, Inc. All rights reserved.
27
JSON configuration
• neo4j.topic.cypher.<topic-name>
• neo4j.topic.cdc.sourceId
• neo4j.topic.cdc.sourceId.labelName
• neo4j.topic.cdc.sourceId.idName
• neo4j.topic.cypher.my-topic
• neo4j.topic.cdc.schema
• neo4j.topic.pattern.node.<topic-name>
• neo4j.topic.pattern.relationship.<topic-name>
© 2022 Neo4j, Inc. All rights reserved.
28
Neo4j Sink Connector - How it works
It works in several ways:
• by providing a Cypher template
• by ingesting the events emitted from another Neo4j instance via the
Change Data Capture module
• by providing a pattern extraction to a JSON or AVRO file
• by managing a CUD file format
All the Sink configs has to be prefixed with “neo4j.”
© 2022 Neo4j, Inc. All rights reserved.
29
Neo4j Sink Connector - Cypher Template
neo4j.topic.cypher.<TOPIC_NAME>=<CYPHER_QUERY>
For example, given the following query:
MERGE (n:Label {id: event.id}) ON CREATE SET n += event.properties
Sink module inject events object in this way:
UNWIND {events} AS event MERGE (n:Label {id: event.id}) ON CREATE SET n
+= event.properties
!!ATTENTION!! if the query is not optimized, you can run into possible
performance issues or in situations where the plugin seems to be stuck
© 2022 Neo4j, Inc. All rights reserved.
30
DEMO
© 2022 Neo4j, Inc. All rights reserved.
31
Neo4j Sink Connector - CDC Schema strategy
Typically used to migrate from Neo4j on-premise to Neo4j Aura
Merges the nodes/relationships by the constraints (UNIQUENESS,
NODE_KEY) defined in the graph model.
neo4j.topic.cdc.schema=<LIST_OF_TOPICS_SEPARATED_BY_SEMICOLON>
© 2022 Neo4j, Inc. All rights reserved.
32
DEMO
© 2022 Neo4j, Inc. All rights reserved.
33
Neo4j Sink Connector - Pattern strategy
Gets nodes and relationships from an event by providing an extraction pattern
Each property can be prefixed with:
• ! identify the id (could be more than one property), it’s mandatory
• - exclude the property from the extraction.
Labels can be chained via :
© 2022 Neo4j, Inc. All rights reserved.
34
Neo4j Sink Connector - Pattern strategy - NODE
To configure a node pattern, the following configuration has to be defined:
neo4j.topic.pattern.node.<TOPIC_NAME>=<NODE_EXTRACTION_PATTERN>
Pattern Description
User:Actor{!userId}
User:Actor{!userId,*}
the userId will be used as ID field and all other properties of will be attached to the
node with the provided labels (User and Actor)
User{!userId, surname} the userId will be used as ID field and only the surname property will be attached to
the node with the provided labels (User)
User{!userId, surname, address.city} the userId will be used as ID field and just the surname and address.city properties
will be attached to the node with the provided labels (User)
User{!userId,-address} the userId will be used as ID field and just the address property will be excluded
© 2022 Neo4j, Inc. All rights reserved.
35
DEMO
© 2022 Neo4j, Inc. All rights reserved.
36
Neo4j Sink Connector - Pattern strategy - RELATIONSHIP
To configure a relationship pattern, the following configuration has to be
defined:
neo4j.topic.pattern.relationship.<TOPIC_NAME>=<RELATIONSHIP_EXTRACTION_PA
TTERN>
Events will be transformed following a pattern like (n)-[r]->(m)
© 2022 Neo4j, Inc. All rights reserved.
37
Neo4j Sink Connector - Pattern strategy - RELATIONSHIP
Pattern Description
(User{!userId})-[:BOUGHT]→(Product{!productId}) merge the two nodes by the provided identifier and the
BOUGHT relationship between them. And then set all the other
properties
(User{!userId})-[:BOUGHT{price}]→(Product{!productId}) merge the two nodes by the provided identifiers and the
BOUGHT relationship between them. And then set all the
specified properties
(User{!userId})-[:BOUGHT{-shippingAddress}]→(Product{!productId}) merge the two nodes by the provided identifiers and the
BOUGHT relationship between them. And then set all the
specified properties (except the excluded shippingAddress)
(User{!userId, userName, userSurname})-
[:BOUGHT]→(Product{!productId, productName})
By default no properties will be attached to the edge nodes.
This merge the two nodes by the provided identifiers (with the
other specified properties) and the BOUGHT relationship
between them
© 2022 Neo4j, Inc. All rights reserved.
38
DEMO
© 2022 Neo4j, Inc. All rights reserved.
39
Neo4j Sink Connector - CUD File Format - NODE
© 2022 Neo4j, Inc. All rights reserved.
40
Neo4j Sink Connector - CUD File Format - RELATIONSHIP
© 2022 Neo4j, Inc. All rights reserved.
41
DEMO
© 2022 Neo4j, Inc. All rights reserved.
42
Neo4j Sink Connector - CDC SourceId strategy
Merges the nodes/relationships by the CDC event id field (it’s related to the
Neo4j physical ID)
neo4j.topic.cdc.sourceId=<list of topics separated by semicolon>
neo4j.topic.cdc.sourceId.labelName=<the label attached to the node,
default=SourceEvent>
neo4j.topic.cdc.sourceId.idName=<the id name given to the CDC id field,
default=sourceId>
© 2022 Neo4j, Inc. All rights reserved.
43
Neo4j Sink Connector - CDC SourceId strategy
Person:SourceEvent{first_name:
"Anne Marie", last_name:
"Kretchmar", email:
"annek@noanswer.org",
sourceId: "1004"}
© 2022 Neo4j, Inc. All rights reserved.
44
DEMO
© 2022 Neo4j, Inc. All rights reserved.
45
Neo4j Sink Connector - How to handle bad data
The Sink module provides several ways to deal with bad data:
• Fail fast (default)
• Silently ignoring bad messages and enabling log about them (also adding
metadata as headers)
• Configuring a Dead Letter Queue
© 2022 Neo4j, Inc. All rights reserved.
46
Neo4j Sink Connector - How to handle bad data
• errors.tolerance
• errors.log.enable
• errors.log.include.messages
• errors.deadletterqueue.topic.name
• errors.deadletterqueue.context.headers.enable
• errors.deadletterqueue.context.headers.prefix
• errors.deadletterqueue.topic.replication.factor
© 2022 Neo4j, Inc. All rights reserved.
47
Neo4j Sink Connector - How to handle bad data
kafka.bootstrap.servers
kafka.<any_other_kafka_property>
© 2022 Neo4j, Inc. All rights reserved.
48
DEMO
© 2022 Neo4j, Inc. All rights reserved.
49
Neo4j Connector
Multi-database support
© 2022 Neo4j, Inc. All rights reserved.
50
Multi-database support
Since Neo4j 4 Enterprise edition, multi-tenancy support has been introduced.
The Neo4j Connector supports this feature and allows you to connect to a
specific database for each instance of the connector by simply adding the
following parameter:
neo4j.database
© 2022 Neo4j, Inc. All rights reserved.
51
Neo4j Connector
Pros & Cons
© 2022 Neo4j, Inc. All rights reserved.
52
Pros & Cons
Pros
• memory & CPU doesn’t impact Neo4j
• easier for Kafka pros to manage;
• 0 downtime for Neo4j
• better overall security management
Cons
• If you’re using Confluent Cloud, you can’t host the connector in their
platform (yet)
• bolt latency & overhead / separate network hop
© 2022 Neo4j, Inc. All rights reserved.
53
Neo4j Connector
Architectural considerations
© 2022 Neo4j, Inc. All rights reserved.
54
Architectural considerations
• Pay attention to transactions locking when writing relationships
• Use Kafka! Don’t Use Neo4j for Everything
• Keep Topics Consistent
© 2022 Neo4j, Inc. All rights reserved.
55
Conclusions
© 2022 Neo4j, Inc. All rights reserved.
56
Conclusions
• Neo4j connector main features
• How source connector works
• How sink connector works
• How to handle errors
• Pros & cons
• Architectural considerations
© 2022 Neo4j, Inc. All rights reserved.
57
© 2022 Neo4j, Inc. All rights reserved.
58
Q&A
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
59
Thank you!
Contact us at
sales@larus-ba.it

More Related Content

What's hot (20)

PDF
Linux Systems Performance 2016
Brendan Gregg
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Introduction to IAC and Terraform
Venkat NaveenKashyap Devulapally
 
PPSX
CI-CD Jenkins, GitHub Actions, Tekton
Araf Karsh Hamid
 
PPTX
Intro to Neo4j
Neo4j
 
PDF
Building Reliable Data Lakes at Scale with Delta Lake
Databricks
 
KEY
Intro to Neo4j presentation
jexp
 
PDF
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze
 
PDF
Intro to Neo4j and Graph Databases
Neo4j
 
PPTX
Introduction to Cypher
Neo4j
 
PPTX
Apache Spark Architecture
Alexey Grishchenko
 
PPTX
K8s security best practices
Sharon Vendrov
 
PPTX
MySQL Monitoring using Prometheus & Grafana
YoungHeon (Roy) Kim
 
PDF
The Graph Database Universe: Neo4j Overview
Neo4j
 
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
PDF
Terraform
Marcelo Serpa
 
PPT
Hadoop Security Architecture
Owen O'Malley
 
PDF
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
Markus Michalewicz
 
PDF
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
OpenStack Korea Community
 
PDF
Some Iceberg Basics for Beginners (CDP).pdf
Michael Kogan
 
Linux Systems Performance 2016
Brendan Gregg
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Introduction to IAC and Terraform
Venkat NaveenKashyap Devulapally
 
CI-CD Jenkins, GitHub Actions, Tekton
Araf Karsh Hamid
 
Intro to Neo4j
Neo4j
 
Building Reliable Data Lakes at Scale with Delta Lake
Databricks
 
Intro to Neo4j presentation
jexp
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
LibbySchulze
 
Intro to Neo4j and Graph Databases
Neo4j
 
Introduction to Cypher
Neo4j
 
Apache Spark Architecture
Alexey Grishchenko
 
K8s security best practices
Sharon Vendrov
 
MySQL Monitoring using Prometheus & Grafana
YoungHeon (Roy) Kim
 
The Graph Database Universe: Neo4j Overview
Neo4j
 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Terraform
Marcelo Serpa
 
Hadoop Security Architecture
Owen O'Malley
 
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
Markus Michalewicz
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
OpenStack Korea Community
 
Some Iceberg Basics for Beginners (CDP).pdf
Michael Kogan
 

Similar to Master Real-Time Streams With Neo4j and Apache Kafka (20)

PDF
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
PDF
How to leverage Kafka data streams with Neo4j
GraphRM
 
PDF
Nordics Edition - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
PDF
Neo4j: The path to success with Graph Database and Graph Data Science
Neo4j
 
PDF
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
Neo4j
 
PDF
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
PDF
Ultime Novità di Prodotto Neo4j
Neo4j
 
PDF
Neo4j : la voie du succès avec les bases de données de graphes et la Graph Da...
Neo4j
 
PPTX
Leveraging Neo4j With Apache Spark
Neo4j
 
PDF
Itop vpn crack Latest Version 2025 FREE Download
mahnoorwaqar444
 
PDF
FL Studio Crack FREE Download link 2025 NEW Version
mahnoorwaqar444
 
PDF
Autodesk Netfabb Ultimate 2025 free crack
blouch110kp
 
PDF
Office(R)Tool Download crack (Latest 2025)
blouch120kp
 
PDF
GRAPHISOFT ArchiCAD 28.1.1.4100 free crack
blouch136kp
 
PDF
Auslogics Video Grabber Free 1.0.0.12 Free
blouch134kp
 
PPTX
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
Neo4j
 
PDF
The Art of the Possible with Graph Technology
Neo4j
 
PDF
Peek into Neo4j Product Strategy and Roadmap
Neo4j
 
PDF
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
HostedbyConfluent
 
PDF
A Connections-first Approach to Supply Chain Optimization
Neo4j
 
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
How to leverage Kafka data streams with Neo4j
GraphRM
 
Nordics Edition - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
Neo4j: The path to success with Graph Database and Graph Data Science
Neo4j
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
Neo4j
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
Ultime Novità di Prodotto Neo4j
Neo4j
 
Neo4j : la voie du succès avec les bases de données de graphes et la Graph Da...
Neo4j
 
Leveraging Neo4j With Apache Spark
Neo4j
 
Itop vpn crack Latest Version 2025 FREE Download
mahnoorwaqar444
 
FL Studio Crack FREE Download link 2025 NEW Version
mahnoorwaqar444
 
Autodesk Netfabb Ultimate 2025 free crack
blouch110kp
 
Office(R)Tool Download crack (Latest 2025)
blouch120kp
 
GRAPHISOFT ArchiCAD 28.1.1.4100 free crack
blouch136kp
 
Auslogics Video Grabber Free 1.0.0.12 Free
blouch134kp
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
Neo4j
 
The Art of the Possible with Graph Technology
Neo4j
 
Peek into Neo4j Product Strategy and Roadmap
Neo4j
 
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
HostedbyConfluent
 
A Connections-first Approach to Supply Chain Optimization
Neo4j
 
Ad

More from Neo4j (20)

PDF
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
PPTX
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
PDF
Neo4j: The Art of the Possible with Graph
Neo4j
 
PDF
Smarter Knowledge Graphs For Public Sector
Neo4j
 
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
PDF
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
PDF
Démonstration Digital Twin Building Wire Management
Neo4j
 
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
PDF
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
PDF
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
PDF
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
PDF
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
Neo4j: The Art of the Possible with Graph
Neo4j
 
Smarter Knowledge Graphs For Public Sector
Neo4j
 
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
Démonstration Digital Twin Building Wire Management
Neo4j
 
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
Ad

Recently uploaded (20)

PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 

Master Real-Time Streams With Neo4j and Apache Kafka

  • 1. © 2022 Neo4j, Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. Master Real-Time Streams With Neo4j and Apache Kafka Andrea Santurbano, CTO @ LARUS Business Automation srl Mauro Roiter, Data Engineer @ LARUS Business Automation srl
  • 2. © 2022 Neo4j, Inc. All rights reserved. 2 Who are we? Andrea Santurbano @santand84 [:CTO_AT] [:LOVES] [:IS_PREMIER_PARTNER]
  • 3. © 2022 Neo4j, Inc. All rights reserved. 3 About Larus
  • 4. © 2022 Neo4j, Inc. All rights reserved. 4 LARUS Business Automation Big Data Platform Design & Development (Java, Scala, Python, Javascript) Data Engineering Graph Data Visualization Data Science We help companies to become insight-driven organizations Machine Learning and AI graph based technology
  • 5. © 2022 Neo4j, Inc. All rights reserved. 5 2016 Neo4j JDBC Driver 2015 2011 First Spikes in Retail for Articles’ Clustering 2014 2018 Neo4j APOC, ETL, GraphQL, Zeppelin 2019 Neo4j Kafka Connector 2020 PREMIER Partner AI Based on Neo4j Neo4j Spark Connector
  • 6. © 2022 Neo4j, Inc. All rights reserved. 6 Agenda
  • 7. © 2022 Neo4j, Inc. All rights reserved. Agenda • Apache Kafka • Kafka Connect • Kafka Connect Neo4j Connector overview • Source Connector • Sink Connector • How deal with bad data • Multi database support • Pros & Cons • Architectural considerations • Conclusions • Q&A 7
  • 8. © 2022 Neo4j, Inc. All rights reserved. 8 Apache Kafka
  • 9. © 2022 Neo4j, Inc. All rights reserved. 9 Apache Kafka “Apache Kafka is an open-source distributed event streaming platform for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications” – Apache Kafka
  • 10. © 2022 Neo4j, Inc. All rights reserved. 10 Kafka Connect
  • 11. © 2022 Neo4j, Inc. All rights reserved. 11 Kafka Connect • Is a component of Apache Kafka for connecting Kafka with external systems such as databases, key-value stores, search indexes, and others. • Connectors, tasks, workers • It can be run in two mode: standalone or distributed • 2 types of connectors: source or sink • Designed to be scalable and fault tolerant KAFKA CONNECT KAFKA CONNECT
  • 12. © 2022 Neo4j, Inc. All rights reserved. 12 Kafka Connect - REST API To install/uninstall connector we have to use the “/connector” REST endpoint which can be invoked with the following HTTP methods: • POST: create the given connector • PUT: update the given connector • GET: return the list of active connectors • DELETE: delete the given connector
  • 13. © 2022 Neo4j, Inc. All rights reserved. 13 Kafka Connect Neo4j Connector Overview
  • 14. © 2022 Neo4j, Inc. All rights reserved. 14 1. Confluent Hub 2. GitHub repository Where to find it
  • 15. © 2022 Neo4j, Inc. All rights reserved. 15 Kafka Neo4j Connector Sink Source 15
  • 16. © 2022 Neo4j, Inc. All rights reserved. 16 Neo4j Source Connector
  • 17. © 2022 Neo4j, Inc. All rights reserved. It works similarly to the JDBC Source connector • It will periodically execute a Cypher query • A timestamp field will be used to keep track of the the data that has been modified since the last execution • It does not use the Change Data Capture pattern 17 Neo4j Source Connector
  • 18. © 2022 Neo4j, Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. 18 WORKAROUND WARNING • Use a unique field together with the timestamp field in order to write a more selective query • SMT functions to manage possible event’s duplication • Timestamps aren’t necessarily unique • This mode can’t guarantee that all updated data will be delivered
  • 19. © 2022 Neo4j, Inc. All rights reserved. 19 Neo4j Source Connector - How to setup curl -X POST http://localhost:8083/connectors -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<json-config-file-name> JSON REST API
  • 20. © 2022 Neo4j, Inc. All rights reserved. 20 JSON configuration • neo4j.streaming.property • neo4j.streaming.from • neo4j.source.query • neo4j.streaming.poll.interval.msecs • $lastCheck?? It is a parameter calculated based on the “neo4j.streaming.from” value
  • 21. © 2022 Neo4j, Inc. All rights reserved. 21 Neo4j Source Connector - How it works • neo4j.streaming.poll.interval.msecs: polling frequency • neo4j.source.query: data extraction query • neo4j.streaming.from: ◦ ALL, from the beginning (the entire database) ◦ LAST_COMMITTED, will try to retrieve already committed offset. If any committed offset was found then it will use NOW as fallback ◦ NOW, default, from the current time
  • 22. © 2022 Neo4j, Inc. All rights reserved. 22 Neo4j Source Connector - How it works Hard deletes are not supported!!!
  • 23. © 2022 Neo4j, Inc. All rights reserved. 23 Startup workshop environment Docker environment composed by the following containers: • Confluent Kafka • Confluent Zookeeper • Confluent Kafka Connect + Neo4j Connector • Neo4j + Neo4j Streams • Apache Zeppelin Just run: docker-compose up -d Then open your browser and connect to Zeppelin home page at: http://localhost:8080
  • 24. © 2022 Neo4j, Inc. All rights reserved. 24 DEMO
  • 25. © 2022 Neo4j, Inc. All rights reserved. 25 Neo4j Sink Connector
  • 26. © 2022 Neo4j, Inc. All rights reserved. 26 Neo4j Sink Connector - How to setup curl -X POST http://localhost:8083/connectors -H 'Content-Type:application/json' -H 'Accept:application/json' -d @<json-config-file-name> JSON REST API
  • 27. © 2022 Neo4j, Inc. All rights reserved. 27 JSON configuration • neo4j.topic.cypher.<topic-name> • neo4j.topic.cdc.sourceId • neo4j.topic.cdc.sourceId.labelName • neo4j.topic.cdc.sourceId.idName • neo4j.topic.cypher.my-topic • neo4j.topic.cdc.schema • neo4j.topic.pattern.node.<topic-name> • neo4j.topic.pattern.relationship.<topic-name>
  • 28. © 2022 Neo4j, Inc. All rights reserved. 28 Neo4j Sink Connector - How it works It works in several ways: • by providing a Cypher template • by ingesting the events emitted from another Neo4j instance via the Change Data Capture module • by providing a pattern extraction to a JSON or AVRO file • by managing a CUD file format All the Sink configs has to be prefixed with “neo4j.”
  • 29. © 2022 Neo4j, Inc. All rights reserved. 29 Neo4j Sink Connector - Cypher Template neo4j.topic.cypher.<TOPIC_NAME>=<CYPHER_QUERY> For example, given the following query: MERGE (n:Label {id: event.id}) ON CREATE SET n += event.properties Sink module inject events object in this way: UNWIND {events} AS event MERGE (n:Label {id: event.id}) ON CREATE SET n += event.properties !!ATTENTION!! if the query is not optimized, you can run into possible performance issues or in situations where the plugin seems to be stuck
  • 30. © 2022 Neo4j, Inc. All rights reserved. 30 DEMO
  • 31. © 2022 Neo4j, Inc. All rights reserved. 31 Neo4j Sink Connector - CDC Schema strategy Typically used to migrate from Neo4j on-premise to Neo4j Aura Merges the nodes/relationships by the constraints (UNIQUENESS, NODE_KEY) defined in the graph model. neo4j.topic.cdc.schema=<LIST_OF_TOPICS_SEPARATED_BY_SEMICOLON>
  • 32. © 2022 Neo4j, Inc. All rights reserved. 32 DEMO
  • 33. © 2022 Neo4j, Inc. All rights reserved. 33 Neo4j Sink Connector - Pattern strategy Gets nodes and relationships from an event by providing an extraction pattern Each property can be prefixed with: • ! identify the id (could be more than one property), it’s mandatory • - exclude the property from the extraction. Labels can be chained via :
  • 34. © 2022 Neo4j, Inc. All rights reserved. 34 Neo4j Sink Connector - Pattern strategy - NODE To configure a node pattern, the following configuration has to be defined: neo4j.topic.pattern.node.<TOPIC_NAME>=<NODE_EXTRACTION_PATTERN> Pattern Description User:Actor{!userId} User:Actor{!userId,*} the userId will be used as ID field and all other properties of will be attached to the node with the provided labels (User and Actor) User{!userId, surname} the userId will be used as ID field and only the surname property will be attached to the node with the provided labels (User) User{!userId, surname, address.city} the userId will be used as ID field and just the surname and address.city properties will be attached to the node with the provided labels (User) User{!userId,-address} the userId will be used as ID field and just the address property will be excluded
  • 35. © 2022 Neo4j, Inc. All rights reserved. 35 DEMO
  • 36. © 2022 Neo4j, Inc. All rights reserved. 36 Neo4j Sink Connector - Pattern strategy - RELATIONSHIP To configure a relationship pattern, the following configuration has to be defined: neo4j.topic.pattern.relationship.<TOPIC_NAME>=<RELATIONSHIP_EXTRACTION_PA TTERN> Events will be transformed following a pattern like (n)-[r]->(m)
  • 37. © 2022 Neo4j, Inc. All rights reserved. 37 Neo4j Sink Connector - Pattern strategy - RELATIONSHIP Pattern Description (User{!userId})-[:BOUGHT]→(Product{!productId}) merge the two nodes by the provided identifier and the BOUGHT relationship between them. And then set all the other properties (User{!userId})-[:BOUGHT{price}]→(Product{!productId}) merge the two nodes by the provided identifiers and the BOUGHT relationship between them. And then set all the specified properties (User{!userId})-[:BOUGHT{-shippingAddress}]→(Product{!productId}) merge the two nodes by the provided identifiers and the BOUGHT relationship between them. And then set all the specified properties (except the excluded shippingAddress) (User{!userId, userName, userSurname})- [:BOUGHT]→(Product{!productId, productName}) By default no properties will be attached to the edge nodes. This merge the two nodes by the provided identifiers (with the other specified properties) and the BOUGHT relationship between them
  • 38. © 2022 Neo4j, Inc. All rights reserved. 38 DEMO
  • 39. © 2022 Neo4j, Inc. All rights reserved. 39 Neo4j Sink Connector - CUD File Format - NODE
  • 40. © 2022 Neo4j, Inc. All rights reserved. 40 Neo4j Sink Connector - CUD File Format - RELATIONSHIP
  • 41. © 2022 Neo4j, Inc. All rights reserved. 41 DEMO
  • 42. © 2022 Neo4j, Inc. All rights reserved. 42 Neo4j Sink Connector - CDC SourceId strategy Merges the nodes/relationships by the CDC event id field (it’s related to the Neo4j physical ID) neo4j.topic.cdc.sourceId=<list of topics separated by semicolon> neo4j.topic.cdc.sourceId.labelName=<the label attached to the node, default=SourceEvent> neo4j.topic.cdc.sourceId.idName=<the id name given to the CDC id field, default=sourceId>
  • 43. © 2022 Neo4j, Inc. All rights reserved. 43 Neo4j Sink Connector - CDC SourceId strategy Person:SourceEvent{first_name: "Anne Marie", last_name: "Kretchmar", email: "[email protected]", sourceId: "1004"}
  • 44. © 2022 Neo4j, Inc. All rights reserved. 44 DEMO
  • 45. © 2022 Neo4j, Inc. All rights reserved. 45 Neo4j Sink Connector - How to handle bad data The Sink module provides several ways to deal with bad data: • Fail fast (default) • Silently ignoring bad messages and enabling log about them (also adding metadata as headers) • Configuring a Dead Letter Queue
  • 46. © 2022 Neo4j, Inc. All rights reserved. 46 Neo4j Sink Connector - How to handle bad data • errors.tolerance • errors.log.enable • errors.log.include.messages • errors.deadletterqueue.topic.name • errors.deadletterqueue.context.headers.enable • errors.deadletterqueue.context.headers.prefix • errors.deadletterqueue.topic.replication.factor
  • 47. © 2022 Neo4j, Inc. All rights reserved. 47 Neo4j Sink Connector - How to handle bad data kafka.bootstrap.servers kafka.<any_other_kafka_property>
  • 48. © 2022 Neo4j, Inc. All rights reserved. 48 DEMO
  • 49. © 2022 Neo4j, Inc. All rights reserved. 49 Neo4j Connector Multi-database support
  • 50. © 2022 Neo4j, Inc. All rights reserved. 50 Multi-database support Since Neo4j 4 Enterprise edition, multi-tenancy support has been introduced. The Neo4j Connector supports this feature and allows you to connect to a specific database for each instance of the connector by simply adding the following parameter: neo4j.database
  • 51. © 2022 Neo4j, Inc. All rights reserved. 51 Neo4j Connector Pros & Cons
  • 52. © 2022 Neo4j, Inc. All rights reserved. 52 Pros & Cons Pros • memory & CPU doesn’t impact Neo4j • easier for Kafka pros to manage; • 0 downtime for Neo4j • better overall security management Cons • If you’re using Confluent Cloud, you can’t host the connector in their platform (yet) • bolt latency & overhead / separate network hop
  • 53. © 2022 Neo4j, Inc. All rights reserved. 53 Neo4j Connector Architectural considerations
  • 54. © 2022 Neo4j, Inc. All rights reserved. 54 Architectural considerations • Pay attention to transactions locking when writing relationships • Use Kafka! Don’t Use Neo4j for Everything • Keep Topics Consistent
  • 55. © 2022 Neo4j, Inc. All rights reserved. 55 Conclusions
  • 56. © 2022 Neo4j, Inc. All rights reserved. 56 Conclusions • Neo4j connector main features • How source connector works • How sink connector works • How to handle errors • Pros & cons • Architectural considerations
  • 57. © 2022 Neo4j, Inc. All rights reserved. 57
  • 58. © 2022 Neo4j, Inc. All rights reserved. 58 Q&A
  • 59. © 2022 Neo4j, Inc. All rights reserved. © 2022 Neo4j, Inc. All rights reserved. 59 Thank you! Contact us at [email protected]

Editor's Notes

  • #8: This is today’s agenda: A brief introduction to Apache Kafka and the Kafka Connect framework We’ll see what is the Neo4j Connector and its two mode of operation: the source connector and the sink connector. We’ll also discover how to deal with bad data and how to manage the multi-tenancy feature which is available since Neo4j 4 enterprise edition. We’ll talk about some pros and cons of using this connector and we’ll give you some architectural considerations in order to use it properly. If you have some questions or doubts feel free to stop me. However at the end of the session there will be a Q&A session.
  • #10: What is Apache Kafka? By definition, Apache Kafka is an open-source distributed event streaming platform for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. What does Event streaming mean?? Event streaming is the practice of capturing data in real-time from event sources like databases, sensors, mobile devices, cloud services, and software applications in the form of streams of events; storing these event streams durably for later retrieval; manipulating, processing, and reacting to the event streams in real-time as well as retrospectively; and routing the event streams to different destination technologies as needed. Event streaming thus ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time. Apache Kafka has 3 key capabilities: To Publish(produce) and subscribe to(consume) streams of records; Store streams of records in a fault-tolerant durable way into topics; Process streams of records as they occur.
  • #12: Is a component of Apache Kafka for connecting Kafka with external systems such as databases, key-value stores, search indexes, and others. In Kafka Connect, Connectors and tasks are logical units of work and run as a process. Those processes are called workers. A worker can be run in two modes: Standalone, generally used for development and testing purposes on a local machine; Distributed, workers run on multiple Connect nodes. So here we have a Connect cluster and connectors are distributed across the cluster nodes Kafka Connect includes two types of connectors: Source connector – This type of connectors extract data from external systems, such as relational databases, and store them into a Kafka topic Sink connector – This type of connectors read data from Kafka topics, translate and send them to external systems such as Elasticsearch, or Neo4j. Designed to be scalable and fault tolerant
  • #15: Neo4j Connector is a tool we’ve developed for integrate Neo4j and Kafka (it works for both the Confluent and Apache distributions) and it is very helpful when you have streams of data and you want to store them into Neo4j to apply some sort of graph analysis or when you have a graph database and you want to catch the Neo4j transactions and store them into a Kafka topic to make them available for any other tool of your data pipeline You can download it from the official Github repository or from the Confluent Hub. Just FYI: Neo4j Connector exists also as Neo4j plugin and it is called Neo4j Streams, but it is going to be deprecated soon. However, we’re going to use it for our today’s demos and we’ll see later why and how
  • #16: It can be installed in two modes: Source: allows to import data from Neo4j into a Kafka topic Sink: allows to import data from a Kafka topic into Neo4j
  • #19: Please note that: Timestamps are not necessarily unique So, This mode can’t guarantee that all updated data will be delivered And also it could introduce to possible data duplication For example, if two rows share the same timestamp, and only one has been pushed to the kafka topic before a failure, the second update will be missed when the system recovers. What we can do to avoid these issues is: Using a unique field together with the timestamp field in order to write a more selective query (this means modelling properly our Neo4j database) Using Single Message Transformations functions to manage possible event’s duplication by post-processing events once they’re arrived into the topic.
  • #20: To create a Source connector instance the only thing to do is provide a piece of JSON configuration to a Kafka Connect REST endpoint and here is an example of how to invoke it
  • #21: Here we have a JSON configuration example. What we must put into it is: A topic where we want to send events The converters that we want to use for both the keys and the values of the events (as you probably know Kafka events are key-value pairs even if the key is not mandatory) A set of properties to define the connection to a Neo4j database And finally a set of properties to define how the connectors works: Neo4j.streaming.property, which is the property name that we want to use as timestamp field (this means that all the nodes we are interested in must have this property) neo4j.streaming.from neo4j.source.query Neo4j.streaming.poll.interval.msecs A dedicated mention goes to the $lastCheck parameter: It is calculated based on the “neo4j.streaming.from” value How the latest 3 properties works together? We’ll see in the next slide
  • #22: neo4j.streaming.poll.interval.msecs tells the connector how frequently it has to poll the Neo4j database neo4j.source.query defines which data will be extracted from the Neo4j database and reflected into a topic as events Data starts being produced based on the neo4j.streaming.from value: ALL, it means “from the beginning” (the entire database) LAST_COMMITTED, it will try to retrieve already committed offset. If any committed offset was found then it will use NOW as fallback NOW, default, it starts from the current time (this means that all the nodes/relationships that were created before this time will not be reflected into the Kafka topic)
  • #23: Please note that Source connector supports only soft delete!!
  • #24: For today’s demos we’ve prepared a Dockerized environment composed by the following containers: A Kafka container representing a single-instance cluster A Zookeeper container for cluster management A Kafka Connect container that will be used to connect Kafka and Neo4j A Neo4j container with Neo4j Streams (configured as a Source on a non-default database we’ll create later) A Zeppelin container we’ve used to build our examples If you like to play with it, you can find it at the following Github repository: https://github.com/larusba/gc-2022-neo4j-connector To startup the environment, just run `docker-compose up -d` Furthermore, we’ll use Kafkacat tool in order to produce and consume events for our examples
  • #27: Also for the sink connector setup it’s always about submitting a JSON file to the /connectors REST endpoint
  • #28: Here is a JSON example. The sink connector offers multiple strategy and here we can see which property we have to define to enable each of them. We always have to provide the connection parameters to the Neo4j database and a comma-separated list of topics to listen from The converters. There are many types of converters such as StringConverter, JsonConverter, AvroConverter and others. For today’s demos we’re going to use JsonConverter We can also define how to manage errors. In particular we can setup a Dead Letter Queue’s topic where all the events that has caused an error will be redirected for future elaboration. This means that users has the possibility to always understand what has gone wrong and find the event that has raised the error, then fix it and reprocess that event. Let’s see how all the available strategies works
  • #29: It works in several ways: by providing a Cypher template by ingesting the events emitted from another Neo4j instance via the Change Data Capture module by providing a pattern extraction to a JSON or AVRO file by managing a CUD file format A things to keep in mind is that all the Sink connector parameters has to be prefixed with “neo4j.” Let’s take a look at these strategies one by one
  • #30: With the Cypher template strategy events in a Kafka topic will be processed via a templated cypher query, provided with the following configuration: neo4j.topic.cypher.<TOPIC_NAME>=<CYPHER_QUERY> The query will be applied for each event. For example, given the following query: MERGE (n:Label {id: event.id}) ON CREATE SET n += event.properties The Sink connector inject events object in this way: UNWIND {events} AS event MERGE (n:Label {id: event.id}) ON CREATE SET n += event.properties Where {events} is a JSON/AVRO list !!ATTENTION!! if the query is not optimized, you can run into possible performance issue or in situations where the plugin seems to be stuck execute a query EXPLAIN in order to better analyze the query and avoid this kind of situations if Neo4j seems to be stuck, then execute a CALL dbms.listQueries() to view all running queries within the instance, and to be sure that there are no locked queries
  • #32: This strategy is typically used to migrate from a Neo4j on-premise instance to a Neo4j Aura instance. To do so, we have to: setup Neo4j Streams as a Source on our Neo4j on-premise instance which will redirect all Neo4j transactions, in the form of CDC events, to a Kafka topic setup a Neo4j Sink Connector, enabling the schema strategy, which consume the same topic and recreate the Neo4j entities on the Aura instance It works merging the nodes/relationships by the constraints (UNIQUENESS, NODE_KEY) defined in the graph model. To enable this Sink strategy the following configuration has to be defined: neo4j.topic.cdc.schema=<LIST_OF_TOPICS_SEPARATED_BY_SEMICOLON> So we just need to provide a list of topics to listen from
  • #34: Gets nodes and relationships from an event by providing an extraction pattern Each property can be prefixed with: ! identify the id (could be more than one property), it’s mandatory - exclude the property from the extraction. Labels can be chained via : If no prefix is specified then the property will be included !!ATTENTION!! inclusion and exclusion cannot be mixed. The pattern must contains all exclusion or inclusion properties
  • #35: To configure a node pattern, the following configuration has to be defined: streams.sink.topic.pattern.node.<TOPIC_NAME>=<NODE_EXTRACTION_PATTERN> Following a table with some example of how to use node patterns
  • #37: To configure a relationship pattern, the following configuration has to be defined: streams.sink.topic.pattern.relationship.<TOPIC_NAME>=<RELATIONSHIP_EXTRACTION_PATTERN> Events will be transformed following a pattern like (n)-[r]->(m)
  • #38: Here is some example of how to define some relationship patterns
  • #40: On the left side we have a JSON that represents, via CUD file format, a MERGE operation of a node with the foo and key properties using Foo and Bar labels. On the right side we have the result query created by the connector In particular: op: is a mandatory field. It can have one of the following value: create/merge/update/delete. delete messages are for individual nodes properties: it is a key-value map of the properties attached to the node. It is not mandatory for delete operations, otherwise it is. ids: In case the operation is merge/update/delete this field is mandatory and contains the primary/unique keys of the node that will be use to do the lookup to the entity. In case you use as key the _id name the cud format will refer to Neo4j’s node internal for the node lookup. If you’ll use the _id reference with the op merge it will work as simple update, this means that if the node with the provided internal id does not exists then it will not be created. labels: The labels attached to the node. Neo4j allows to create nodes without labels so this is not a mandatory field, but from a performance perspective, it’s a bad idea don’t provide them. type: node or relationship. It is mandatory detach: In case the operation is delete you can specify if perform a "detach" delete that means delete any incident relationships when you delete a node. If no value is provided, the default is true (so it is not mandatory)
  • #41: On the left side we have a JSON that represents, via CUD file format, a CREATE operation of a relationship with type MY_REL which is connecting a startNode represented by the “from” field to and endNode represented by the “to” field. On the right side we have the result query created by the connector In particular: op: is a mandatory field. It can have one of the following value: create/merge/update/delete properties: It is not mandatory. It is a key-value map of the properties attached to the relationship rel_type: The type of the relationship. Mandatory field. from: Contains information about the source node of the relationship. It is mandatory! if you use the _id field reference into ids you can left labels blank to: Contains information about the target node of the relationship. It is mandatory! if you use the _id field reference into ids you can left labels blank type: node or relationship. It is mandatory
  • #43: This strategy is based on the CDC event structure Merges the nodes/relationships by the CDC event id field (it’s related to the Neo4j physical ID) To enable this Sink strategy the following configurations has to be defined: streams.sink.topic.cdc.sourceId=<list of topics separated by semicolon> streams.sink.topic.cdc.sourceId.labelName=<the label attached to the node, default=SourceEvent> streams.sink.topic.cdc.sourceId.idName=<the id name given to the CDC id field, default=sourceId>
  • #44: Given the JSON event on the left the SourceId strategy will transform it into the node on the right. As you can see: the id field has been transformed into sourceId; the node has an additional label SourceEvent; This helps to identify which nodes has been created by the connector and in particular these two elements will be used in order to match the node/relationship for future operations
  • #47: With the following parameters you can manage how to handle bad data: errors.tolerance: control whether to make the connector fails in case of errors (default approach), or to silently ignore bad messages. Possible values are none/all errors.log.enable: true or false (default). If true the connector will log errors caused by bad messages errors.log.include.messages: true or false (default). When errors logging is enabled the connector will log also the message that has caused the error errors.deadletterqueue.topic.name: It is the topic name where you want to redirect bad messages for future reprocessing errors.deadletterqueue.context.headers.enable: true or false (default). enrich messages with metadata headers like exception, timestamp, topic, partition, offset errors.deadletterqueue.context.headers.prefix: it is a prefix added to headers to identify them easier errors.deadletterqueue.topic.replication.factor: it is the number of partition replicas for the DLQ topic. By default it is 3, so in case of single node cluster it has to be set to 1 (This because the number of topic partition replicas can’t be greater than the brokers number)
  • #48: in addition to the properties described in the previous slides you need to define kafka broker connection properties because the Kafka Connect Framework provides an out-of-the-box support only for deserialization errors and message transformations. We want to extend this feature for transient errors in order to cover the 100% of failures. So to do that at this moment as suggested by Confluent we need to ask again the broker location. Given that, these properties has to be added only if you want to also redirect Neo4j errors into the DLQ. Furthermore, You can also specify any other kafka Producer or Consumer settings just by adding the “kafka.“ prefix
  • #53: Pros Processing is outside of Neo4j so that memory & CPU impact doesn’t impact Neo4j. You don’t need to size the database with Kafka utilization in mind. Much easier for Kafka pros to manage; they benefit from the Confluent ecosystem, such as connecting the REST API to manipulate connectors, the control center to administer & monitor them. By restarting the worker, you can restart your sink/source strategy without having downtime for Neo4j. To upgrade the connector there’s no need to restart the cluster Strictly an external bolt client, so better overall security management of plugin actions. Cons If you’re using Confluent Cloud, you can’t host the connector in their platform (yet). So this requires a 3rd piece of architecture: Confluent Cloud, Neo4j, and the Connect Worker (usually a separate VM) Possibly worse throughput due to bolt latency & overhead, and separate network hop.
  • #55: With Neo4j connector there is the possibility to parallelize the load process! We need to be careful with the parallelization because writing relationships requires taking a lock on both incident nodes, in general when we are doing high-performance batch inserts into Neo4j, we want a single-threaded batch, and we do not want parallelized load processes. So, If you parallelize the loads without careful design, mostly you will get some combination of thread waiting, and lock exceptions / errors. The general approach should be to serialize batches of records into a stream of "one-after-the-other" transactions to maximize throughput. If you decide to parallelize the load process, take care to configure your topics to talk about disjoint parts of the graph, so that you don’t run into concurrent locking issues with Neo4j writes (if possible) Use Kafka! Don’t Use Neo4j for Everything: In general, it is better to use Kafka and things like KSQL to re-shape, transform, and manipulate a stream before it gets to Neo4j. At this stage, you have more flexible options and overall better performance. Keep Topics Consistent: Topics which contain a variety of differently formatted messages should be avoided whenever possible. For example, nel caso di eventi che generano relazioni consumati prima di quelli che dovrebbero aver generato i nodi ad essa collegati
  • #57: We’ve seen: the main features of the Neo4j connector how to set it up as source (to ingest data from Neo4j to Kafka) and as sink (to ingest data from Kafka to Neo4j). How Neo4j source connector works and some tips to avoid some possible issues How Neo4j sink connector works and all the available strategies And finally, but not less important, how to handle errors Some pros and cons of using the Neo4j connector Some architectural considerations in order to properly use the connector and avoid issues
  • #58: Thanks everybody for your attention! Please visit our stand, if you want. We also started an Explainable AI project, called Galileo.XAI, together with our partner Fujitsu! If you are interested to learn more about it, we’ll be happy to meet you at our stand!