SlideShare a Scribd company logo
© 2022 Neo4j, Inc. All rights reserved.
1
Handling Neo4j Data with Apache Hop
Matt Casters
Neo4j Chief Solutions Architect
Apache Hop PMC
© 2022 Neo4j, Inc. All rights reserved.
• Apache Hop introduction
• Setting up Hop
• The basics
• Neo4j functionality
• Advanced topics
• Q&A / Wrap-up
Topics
3
Introduction
© 2022 Neo4j, Inc. All rights reserved.
Apache Hop
“Facilitates all aspects of data and metadata
orchestration”
Typical use-cases:
● Data integration / Data orchestration / ETL
● Data migration
● Message processing
● Data synchronization
● IoT, Big Data, …
© 2022 Neo4j, Inc. All rights reserved.
5
Bridging the gap…
Organizations Tech / Devs
Requirements
© 2022 Neo4j, Inc. All rights reserved.
Concerns of organizations
6
• Setup costs
• Maintenance costs
• Running costs
• Time to market
• Resource availability & the bus factor
• DevOps
• Solution stability
• Investment protection
© 2022 Neo4j, Inc. All rights reserved.
Concerns of developers
7
• Ability to succeed
• Have a fun development environment
• Ability to learn new things
• Work with new technology
• Use best development practices
• Life / Work balance
• …
© 2022 Neo4j, Inc. All rights reserved.
Why Apache Hop?
● Lower development time and cost
● Lower maintenance time and cost
● Increase transparency
● Improve stability
● Make the learning curve steeper
● Protect against brain-drain
● …
https://xkcd.com/1319/
© 2022 Neo4j, Inc. All rights reserved.
9
Why Apache Hop?
• A quickly diversifying technological data landscape
• → Makes it hard to manage complexity
• → Drives the need for rapid innovation
• → Makes it harder to support best dev practices
• Development done independent from a single large corporation
• By and for data orchestration professionals
© 2022 Neo4j, Inc. All rights reserved.
10
Apache Hop
• Recursive acronym: Hop Orchestration Platform
• Orchestration:
◦ Data: pipelines and workflows
◦ Metadata: editing, handling, management,...
◦ Insights: data/execution lineage, logging, …
◦ Configurations: handling ecosystem complexity
• Platform:
◦ GUI, commands, server, scripts, docker, API, documentation,
community, ...
© 2022 Neo4j, Inc. All rights reserved.
11
Background
• Community lead initiative
• Starting point was Kettle 8.2 + WebSpoon + patches + plugins + …
→ Representing 21 years of software development!
• New scalable GUI
• New architecture, metadata back-end
• Simplified toolset
• Code refactored, renamed, trimmed down, ...
• Extra plugins: Projects, Testing, Apache Beam, Debugging,
• …
• Years of work!
© 2022 Neo4j, Inc. All rights reserved.
12
Apache Software Foundation
• Hop is a Top Level Project at the Apache Software Foundation
• Homepage: http://hop.apache.org
• Source: github.com/apache/hop/
• Building and IT on Apache Jenkins CI
• Released 2.1.0 : http://hop.apache.org/download
• Working on 2.2.0
• Fast growing and active community
• Check the website for regular updates & our Hot Hop Hangouts (3Hx)
⇒ 3H17 @ Thu 10 Nov 2022 7pm - 9pm CET
© 2022 Neo4j, Inc. All rights reserved.
13
Key architecture features
• License: Apache Public License v2.0
• Metadata driven: no code generation
• Modular pluggable architecture: scale back to <30MB
• Fast startup, minimal overhead
• Apache Beam with support for Apache Spark, Apache Flink and GCP
Dataflow runners
• Version controlled documentation
• Ease of use: transparent naming and easy to use tools
• Integration test: critical components are tested daily with integration tests
• → runtime compatibility, stability, ...
© 2022 Neo4j, Inc. All rights reserved.
14
Key GUI features
• Pluggable GUI features
• Scalable interface for high DPI displays or visually impaired
• Perspectives for easy fast context switching
• Designed for web browsers and mobile users
• → Single click mode for faster navigation
• Full support for 4 platforms: Windows, OSX, Linux & Web
• Support for “dark mode” themes on Linux and OSX
© 2022 Neo4j, Inc. All rights reserved.
15
Key configuration features
• All GUI configuration options have a command line variant
• Single central system configuration JSON file
• Easy project and lifecycle environment configuration
• Configuration and metadata inheritance from other projects
• Standard docker containers
• Stateless server supporting multi-tenancy
• Version control friendly setup
16
Setting up Hop
With some live action
© 2022 Neo4j, Inc. All rights reserved.
17
Download and install
• Visit hop.apache.org
• Download the latest version
• Use an up-to-date Java 11 Runtime Environment version:
◦ Windows
• Eclipse Adoptium OpenJDK build
• Microsoft build of OpenJDK
◦ Linux
• Just install OpenJDK 11
◦ OSX
• Azul build for x86 or ARM 64-bit
© 2022 Neo4j, Inc. All rights reserved.
18
Configuration
Available configuration variables:
HOP_CONFIG_FOLDER Hop keeps a central system configuration file called
hop-config.json in this folder
HOP_AUDIT_FOLDER Usage history, logging and other auditing information is
stored here.
HOP_JAVA_HOME Points to the JRE you want to use
HOP_OPTIONS The Java options. For example you can change the
maximum memory consumption (2GB by default)
Give Hop tools 8GB or memory: -Xmx8g
HOP_SHARED_JDBC_FOLDER A folder where you can put your extra JDBC drivers
© 2022 Neo4j, Inc. All rights reserved.
Tools
We’ll go over the tools in the Hop download:
● hop-conf : Hop Configuration
● hop-run : Run a pipeline or workflow
● hop-gui : Run the Hop graphical user interface.
● hop-search: Look for something in a project
● hop-server : Start a server to allow you to do remove execution
○ This is NOT a GUI, it’s just a server
© 2022 Neo4j, Inc. All rights reserved.
20
Version control
Hop has built-in (basic) support for git integrated with the explorer perspective.
© 2022 Neo4j, Inc. All rights reserved.
Check out a git repository
The samples from this workshop are stored here:
https://github.com/Neo4jSolutions/hop-nodes-2022
© 2022 Neo4j, Inc. All rights reserved.
22
Projects
• Projects allow you to encapsulate all your work in a single folder
• Makes it easy to check in all your work into git
• Allows you to specify files in a relative way
◦ ${PROJECT_HOME}/files/input-file.csv
• Consider the following project options:
© 2022 Neo4j, Inc. All rights reserved.
Create project
We’ll create a project using the git repository folder we checked out earlier:
hop-nodes-2022
© 2022 Neo4j, Inc. All rights reserved.
24
Environments
Allow you to put environment specific variables separate from your project
© 2022 Neo4j, Inc. All rights reserved.
Create environment
We’ll create a local development environment for our project:
hop-nodes-2022-dev-local
26
The basics
Pipelines, Workflows, Metadata, …
© 2022 Neo4j, Inc. All rights reserved.
27
Concepts
Live explanation of Hop concepts and terminology using Hop GUI.
More information can be found online on the Hop website:
https://hop.apache.org/manual/latest/concepts.html
● Workflows and pipelines
● Actions and transforms
● Run configurations
28
Neo4j functionality
Actions, transforms & more
© 2022 Neo4j, Inc. All rights reserved.
29
Live workshop topics
• Neo4j Connection
◦ How to set up for multiple environments
• Documentation: working with Neo4j data
• Northwind example
◦ Check Neo4j, Neo4j Constraint, Neo4j Output, Cypher,
◦ Graph Output with graph model
• Configure new Aura environment
◦ Run Northwind again
• Running on GCP Dataflow:
◦ BigQuery to Aura
◦ Snowflake to Aura
• Bulk loading basics
© 2022 Neo4j, Inc. All rights reserved.
Beam vs Neo4j best practices
● Set transform copies to BEAM_BATCH
○ Set the flush interval and buffer size in the run configuration
● Match the parallelism to the number of CPUs that the Neo4j server has
○ Dataflow →Pick the right type and amount of worker node
○ Apache Spark / Flink: set the parallelism in the run configuration
31
Advanced topics
Execution information, docker, web services, hop-web, Apache Airflow k8s,
© 2022 Neo4j, Inc. All rights reserved.
32
Execution information
Write information about the execution of workflows and pipelines to a location:
● A folder where execution information is written to JSON files
● A Hop server
● A Neo4j database where execution is written to a graph
⇒ live demo on the local dev environment and Northwind.
© 2022 Neo4j, Inc. All rights reserved.
© 2022 Neo4j, Inc. All rights reserved.
34
Best practices : Unit testing
Hop has built-in support for unit testing. It allows you to provide test data as
input to transforms and validates output of transforms.
© 2022 Neo4j, Inc. All rights reserved.
35
Best practices : Integration testing
Integration testing is made easy with Hop. The project itself has daily
integration tests that are run:
https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/
It uses our standard Apache Hop docker container to execute. It uses other
containers for Postgres, Neo4j, Cassandra, ... to test functionality of those
plugins.
© 2022 Neo4j, Inc. All rights reserved.
36
Logging & Reflection
• Capture logging and metrics from pipelines and workflows
◦ Every time you run a pipeline or workflow another one fires
• At the start
• At intervals during execution
• At the end of the execution
• Reflection: stream data to another pipeline in normalized format
◦ Only supported on “local” pipeline engines for now
• The Neo4j Logging perspective
◦ Install Neo4j or docker
◦ Configure a connection
◦ Set the NEO4J_LOGGING_CONNECTION variable
© 2022 Neo4j, Inc. All rights reserved.
37
Web services
• Execute a pipeline
• Output results of a field as a result
◦ Supported only on a local pipeline engine
◦ But … for a Beam engine you can run a workflow
• Execute Beam pipeline
• Pick up results and output the results
© 2022 Neo4j, Inc. All rights reserved.
38
Docker
• Great way to run your pipelines and workflows
• Starting point for customization
• Also capable of running hop-server
https://hub.docker.com/r/apache/hop
• Documentation:
https://hop.apache.org/tech-manual/latest/docker-container.html
© 2022 Neo4j, Inc. All rights reserved.
39
Hop Web
• Run the Hop GUI in your browser!
• The exact same codebase is used
• Update the docker image:
docker pull apache/hop-web
• Run Hop Web Development in dark mode:
docker run 
-p 8080:8080 
-e HOP_WEB_THEME=dark 
apache/hop-web:Development
• See also: the Hop Web developers guide
© 2022 Neo4j, Inc. All rights reserved.
40
Apache Airflow
• The Hop community is working on specific Airflow Operators
• For now: use a BashOperator to execute
from datetime import datetime
from airflow import DAG
from airflow.operators.bash import BashOperator
# Add a space after the shell filename, otherwise this does not work
#
create_command = "/home/matt/airflow/scripts/run-hop-pipeline.sh "
with DAG(dag_id='run_hop_pipeline'
, schedule_interval
="@daily",
start_date=datetime(2020, 1, 1), catchup=False) as dag:
bash_task = BashOperator(task_id
='bash_task', bash_command=create_command)
© 2022 Neo4j, Inc. All rights reserved.
41
Apache Airflow
• This script runs a Beam sample pipeline using the Direct runner
#!/bin/bash
# Change to the Hop install folder
#
cd /opt/hop
# Set the configuration folder
#
export HOP_CONFIG_FOLDER=/opt/hop/config
# And the audit folder
#
export HOP_AUDIT_FOLDER=/opt/hop/audit
# Run a Beam pipeline from the samples with the Direct runner
#
sh hop-run.sh -j samples -f beam/pipelines/input-process-output.hpl -r Direct
© 2022 Neo4j, Inc. All rights reserved.
Kubernetes
Run a pipeline on AWS EKS using the Apache Flink k8s operator.
→Automatically starts an Apache Flink cluster
→When the cluster is online, it starts an Apache Hop Pipeline
43
Apache Hop Roadmap
What’s next?
© 2022 Neo4j, Inc. All rights reserved.
What’s next?
● Improvements & bug fixes
● Expand integration test suite
● A new welcome dialog with direct links and guided actions
● A new configuration perspective in the Hop GUI
● Give plugins access to the execution information
⇒ Build a Neo4j tab with paths to errors etc.
● Neo4j Cypher Builder transform
● ETA for 2.2.0 is mid-november (3-4 weeks from now)
45
Thank you! Any questions?
Join the Hop community: http://hop.apache.org
© 2022 Neo4j, Inc. All rights reserved.
46
Join the Hop Community
• Website: hop.apache.org
• File a bug or feature request in JIRA
• Mattermost chat server
• The source code on github.
• Jenkins Hop builds
•
• Twitter: https://twitter.com/apachehop
• Youtube: https://youtube.com/apachehop
• Linkedin: https://www.linkedin.com/company/apachehop/
Welcome!

More Related Content

What's hot (20)

PPTX
Data Engineer's Lunch #54: dbt and Spark
Anant Corporation
 
PDF
Airbyte @ Airflow Summit - The new modern data stack
Michel Tricot
 
PDF
Graph based data models
Moumie Soulemane
 
PDF
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Neo4j
 
PDF
Neo4j 4.1 overview
Neo4j
 
PPTX
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
PPTX
Securing Hadoop with Apache Ranger
DataWorks Summit
 
PDF
Get to know PostgreSQL!
Oddbjørn Steffensen
 
PPT
Neo4J : Introduction to Graph Database
Mindfire Solutions
 
PPTX
NiFi Best Practices for the Enterprise
Gregory Keys
 
PDF
Intro to Cypher
Neo4j
 
PPTX
Boost Your Neo4j with User-Defined Procedures
Neo4j
 
PDF
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
PPTX
Apache HBase™
Prashant Gupta
 
PDF
Introduction to Graph Databases
DataStax
 
PPTX
Hive
Manas Nayak
 
PDF
Diving into Delta Lake: Unpacking the Transaction Log
Databricks
 
PDF
Hive tuning
Michael Zhang
 
PDF
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
PDF
Actionable Insights with AI - Snowflake for Data Science
Harald Erb
 
Data Engineer's Lunch #54: dbt and Spark
Anant Corporation
 
Airbyte @ Airflow Summit - The new modern data stack
Michel Tricot
 
Graph based data models
Moumie Soulemane
 
Scaling into Billions of Nodes and Relationships with Neo4j Graph Data Science
Neo4j
 
Neo4j 4.1 overview
Neo4j
 
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
Securing Hadoop with Apache Ranger
DataWorks Summit
 
Get to know PostgreSQL!
Oddbjørn Steffensen
 
Neo4J : Introduction to Graph Database
Mindfire Solutions
 
NiFi Best Practices for the Enterprise
Gregory Keys
 
Intro to Cypher
Neo4j
 
Boost Your Neo4j with User-Defined Procedures
Neo4j
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
 
Apache HBase™
Prashant Gupta
 
Introduction to Graph Databases
DataStax
 
Diving into Delta Lake: Unpacking the Transaction Log
Databricks
 
Hive tuning
Michael Zhang
 
The Neo4j Data Platform for Today & Tomorrow.pdf
Neo4j
 
Actionable Insights with AI - Snowflake for Data Science
Harald Erb
 

Similar to Road to NODES - Handling Neo4j Data with Apache Hop (20)

PPTX
A Schema Migration Tool for the Neo4j Database(Pavel_Kutac).pptx
Neo4j
 
PDF
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
DataWorks Summit
 
PDF
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye
 
PPTX
Modern Web-site Development Pipeline
GlobalLogic Ukraine
 
PPTX
Galera on kubernetes_no_video
Patrick Galbraith
 
PDF
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
HostedbyConfluent
 
PDF
DevSecOps - Security in DevOps
Aarno Aukia
 
PPTX
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
Neo4j
 
PDF
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
OpenShift Origin
 
PDF
Hambug R Meetup - Intro to H2O
Sri Ambati
 
PPTX
Docs as Code: Publishing Processes for API Experiences
Anne Gentle
 
PDF
Ultime Novità di Prodotto Neo4j
Neo4j
 
PDF
Architecting The Future - WeRise Women in Technology
Daniel Barker
 
PDF
State of Big Data on ARM64 / AArch64 - Apache Bigtop
Ganesh Raju
 
PDF
BKK16-400B ODPI - Standardizing Hadoop
Linaro
 
PDF
ODPi (Open Data Platform Initiative) - Linaro Connect
Ganesh Raju
 
PDF
Red Hat Forum Benelux 2015
Microsoft
 
PDF
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
 
PPTX
Cognos Performance Tuning Tips & Tricks
Senturus
 
PDF
Neo4j Data Loading with Kettle
Neo4j
 
A Schema Migration Tool for the Neo4j Database(Pavel_Kutac).pptx
Neo4j
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
DataWorks Summit
 
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye
 
Modern Web-site Development Pipeline
GlobalLogic Ukraine
 
Galera on kubernetes_no_video
Patrick Galbraith
 
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
HostedbyConfluent
 
DevSecOps - Security in DevOps
Aarno Aukia
 
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
Neo4j
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
OpenShift Origin
 
Hambug R Meetup - Intro to H2O
Sri Ambati
 
Docs as Code: Publishing Processes for API Experiences
Anne Gentle
 
Ultime Novità di Prodotto Neo4j
Neo4j
 
Architecting The Future - WeRise Women in Technology
Daniel Barker
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
Ganesh Raju
 
BKK16-400B ODPI - Standardizing Hadoop
Linaro
 
ODPi (Open Data Platform Initiative) - Linaro Connect
Ganesh Raju
 
Red Hat Forum Benelux 2015
Microsoft
 
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
 
Cognos Performance Tuning Tips & Tricks
Senturus
 
Neo4j Data Loading with Kettle
Neo4j
 
Ad

More from Neo4j (20)

PDF
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
PPTX
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
PDF
Neo4j: The Art of the Possible with Graph
Neo4j
 
PDF
Smarter Knowledge Graphs For Public Sector
Neo4j
 
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
PDF
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
PDF
Démonstration Digital Twin Building Wire Management
Neo4j
 
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
PDF
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
PDF
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
PDF
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
PDF
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
GraphSummit Singapore Master Deck - May 20, 2025
Neo4j
 
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j
 
Neo4j Knowledge for Customer Experience.pptx
Neo4j
 
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j
 
Neo4j: The Art of the Possible with Graph
Neo4j
 
Smarter Knowledge Graphs For Public Sector
Neo4j
 
GraphRAG and Knowledge Graphs Exploring AI's Future
Neo4j
 
Matinée GenAI & GraphRAG Paris - Décembre 24
Neo4j
 
ANZ Presentation: GraphSummit Melbourne 2024
Neo4j
 
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Neo4j
 
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Neo4j
 
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Neo4j
 
Démonstration Digital Twin Building Wire Management
Neo4j
 
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Neo4j
 
Démonstration Supply Chain - GraphTalk Paris
Neo4j
 
The Art of Possible - GraphTalk Paris Opening Session
Neo4j
 
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Neo4j
 
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Neo4j
 
Neo4j Graph Data Modelling Session - GraphTalk
Neo4j
 
Neo4j: The Art of Possible with Graph Technology
Neo4j
 
Ad

Recently uploaded (20)

PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
Human Resources Information System (HRIS)
Amity University, Patna
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 

Road to NODES - Handling Neo4j Data with Apache Hop

  • 1. © 2022 Neo4j, Inc. All rights reserved. 1 Handling Neo4j Data with Apache Hop Matt Casters Neo4j Chief Solutions Architect Apache Hop PMC
  • 2. © 2022 Neo4j, Inc. All rights reserved. • Apache Hop introduction • Setting up Hop • The basics • Neo4j functionality • Advanced topics • Q&A / Wrap-up Topics
  • 4. © 2022 Neo4j, Inc. All rights reserved. Apache Hop “Facilitates all aspects of data and metadata orchestration” Typical use-cases: ● Data integration / Data orchestration / ETL ● Data migration ● Message processing ● Data synchronization ● IoT, Big Data, …
  • 5. © 2022 Neo4j, Inc. All rights reserved. 5 Bridging the gap… Organizations Tech / Devs Requirements
  • 6. © 2022 Neo4j, Inc. All rights reserved. Concerns of organizations 6 • Setup costs • Maintenance costs • Running costs • Time to market • Resource availability & the bus factor • DevOps • Solution stability • Investment protection
  • 7. © 2022 Neo4j, Inc. All rights reserved. Concerns of developers 7 • Ability to succeed • Have a fun development environment • Ability to learn new things • Work with new technology • Use best development practices • Life / Work balance • …
  • 8. © 2022 Neo4j, Inc. All rights reserved. Why Apache Hop? ● Lower development time and cost ● Lower maintenance time and cost ● Increase transparency ● Improve stability ● Make the learning curve steeper ● Protect against brain-drain ● … https://xkcd.com/1319/
  • 9. © 2022 Neo4j, Inc. All rights reserved. 9 Why Apache Hop? • A quickly diversifying technological data landscape • → Makes it hard to manage complexity • → Drives the need for rapid innovation • → Makes it harder to support best dev practices • Development done independent from a single large corporation • By and for data orchestration professionals
  • 10. © 2022 Neo4j, Inc. All rights reserved. 10 Apache Hop • Recursive acronym: Hop Orchestration Platform • Orchestration: ◦ Data: pipelines and workflows ◦ Metadata: editing, handling, management,... ◦ Insights: data/execution lineage, logging, … ◦ Configurations: handling ecosystem complexity • Platform: ◦ GUI, commands, server, scripts, docker, API, documentation, community, ...
  • 11. © 2022 Neo4j, Inc. All rights reserved. 11 Background • Community lead initiative • Starting point was Kettle 8.2 + WebSpoon + patches + plugins + … → Representing 21 years of software development! • New scalable GUI • New architecture, metadata back-end • Simplified toolset • Code refactored, renamed, trimmed down, ... • Extra plugins: Projects, Testing, Apache Beam, Debugging, • … • Years of work!
  • 12. © 2022 Neo4j, Inc. All rights reserved. 12 Apache Software Foundation • Hop is a Top Level Project at the Apache Software Foundation • Homepage: http://hop.apache.org • Source: github.com/apache/hop/ • Building and IT on Apache Jenkins CI • Released 2.1.0 : http://hop.apache.org/download • Working on 2.2.0 • Fast growing and active community • Check the website for regular updates & our Hot Hop Hangouts (3Hx) ⇒ 3H17 @ Thu 10 Nov 2022 7pm - 9pm CET
  • 13. © 2022 Neo4j, Inc. All rights reserved. 13 Key architecture features • License: Apache Public License v2.0 • Metadata driven: no code generation • Modular pluggable architecture: scale back to <30MB • Fast startup, minimal overhead • Apache Beam with support for Apache Spark, Apache Flink and GCP Dataflow runners • Version controlled documentation • Ease of use: transparent naming and easy to use tools • Integration test: critical components are tested daily with integration tests • → runtime compatibility, stability, ...
  • 14. © 2022 Neo4j, Inc. All rights reserved. 14 Key GUI features • Pluggable GUI features • Scalable interface for high DPI displays or visually impaired • Perspectives for easy fast context switching • Designed for web browsers and mobile users • → Single click mode for faster navigation • Full support for 4 platforms: Windows, OSX, Linux & Web • Support for “dark mode” themes on Linux and OSX
  • 15. © 2022 Neo4j, Inc. All rights reserved. 15 Key configuration features • All GUI configuration options have a command line variant • Single central system configuration JSON file • Easy project and lifecycle environment configuration • Configuration and metadata inheritance from other projects • Standard docker containers • Stateless server supporting multi-tenancy • Version control friendly setup
  • 16. 16 Setting up Hop With some live action
  • 17. © 2022 Neo4j, Inc. All rights reserved. 17 Download and install • Visit hop.apache.org • Download the latest version • Use an up-to-date Java 11 Runtime Environment version: ◦ Windows • Eclipse Adoptium OpenJDK build • Microsoft build of OpenJDK ◦ Linux • Just install OpenJDK 11 ◦ OSX • Azul build for x86 or ARM 64-bit
  • 18. © 2022 Neo4j, Inc. All rights reserved. 18 Configuration Available configuration variables: HOP_CONFIG_FOLDER Hop keeps a central system configuration file called hop-config.json in this folder HOP_AUDIT_FOLDER Usage history, logging and other auditing information is stored here. HOP_JAVA_HOME Points to the JRE you want to use HOP_OPTIONS The Java options. For example you can change the maximum memory consumption (2GB by default) Give Hop tools 8GB or memory: -Xmx8g HOP_SHARED_JDBC_FOLDER A folder where you can put your extra JDBC drivers
  • 19. © 2022 Neo4j, Inc. All rights reserved. Tools We’ll go over the tools in the Hop download: ● hop-conf : Hop Configuration ● hop-run : Run a pipeline or workflow ● hop-gui : Run the Hop graphical user interface. ● hop-search: Look for something in a project ● hop-server : Start a server to allow you to do remove execution ○ This is NOT a GUI, it’s just a server
  • 20. © 2022 Neo4j, Inc. All rights reserved. 20 Version control Hop has built-in (basic) support for git integrated with the explorer perspective.
  • 21. © 2022 Neo4j, Inc. All rights reserved. Check out a git repository The samples from this workshop are stored here: https://github.com/Neo4jSolutions/hop-nodes-2022
  • 22. © 2022 Neo4j, Inc. All rights reserved. 22 Projects • Projects allow you to encapsulate all your work in a single folder • Makes it easy to check in all your work into git • Allows you to specify files in a relative way ◦ ${PROJECT_HOME}/files/input-file.csv • Consider the following project options:
  • 23. © 2022 Neo4j, Inc. All rights reserved. Create project We’ll create a project using the git repository folder we checked out earlier: hop-nodes-2022
  • 24. © 2022 Neo4j, Inc. All rights reserved. 24 Environments Allow you to put environment specific variables separate from your project
  • 25. © 2022 Neo4j, Inc. All rights reserved. Create environment We’ll create a local development environment for our project: hop-nodes-2022-dev-local
  • 27. © 2022 Neo4j, Inc. All rights reserved. 27 Concepts Live explanation of Hop concepts and terminology using Hop GUI. More information can be found online on the Hop website: https://hop.apache.org/manual/latest/concepts.html ● Workflows and pipelines ● Actions and transforms ● Run configurations
  • 29. © 2022 Neo4j, Inc. All rights reserved. 29 Live workshop topics • Neo4j Connection ◦ How to set up for multiple environments • Documentation: working with Neo4j data • Northwind example ◦ Check Neo4j, Neo4j Constraint, Neo4j Output, Cypher, ◦ Graph Output with graph model • Configure new Aura environment ◦ Run Northwind again • Running on GCP Dataflow: ◦ BigQuery to Aura ◦ Snowflake to Aura • Bulk loading basics
  • 30. © 2022 Neo4j, Inc. All rights reserved. Beam vs Neo4j best practices ● Set transform copies to BEAM_BATCH ○ Set the flush interval and buffer size in the run configuration ● Match the parallelism to the number of CPUs that the Neo4j server has ○ Dataflow →Pick the right type and amount of worker node ○ Apache Spark / Flink: set the parallelism in the run configuration
  • 31. 31 Advanced topics Execution information, docker, web services, hop-web, Apache Airflow k8s,
  • 32. © 2022 Neo4j, Inc. All rights reserved. 32 Execution information Write information about the execution of workflows and pipelines to a location: ● A folder where execution information is written to JSON files ● A Hop server ● A Neo4j database where execution is written to a graph ⇒ live demo on the local dev environment and Northwind.
  • 33. © 2022 Neo4j, Inc. All rights reserved.
  • 34. © 2022 Neo4j, Inc. All rights reserved. 34 Best practices : Unit testing Hop has built-in support for unit testing. It allows you to provide test data as input to transforms and validates output of transforms.
  • 35. © 2022 Neo4j, Inc. All rights reserved. 35 Best practices : Integration testing Integration testing is made easy with Hop. The project itself has daily integration tests that are run: https://ci-builds.apache.org/job/Hop/job/Hop-integration-tests/ It uses our standard Apache Hop docker container to execute. It uses other containers for Postgres, Neo4j, Cassandra, ... to test functionality of those plugins.
  • 36. © 2022 Neo4j, Inc. All rights reserved. 36 Logging & Reflection • Capture logging and metrics from pipelines and workflows ◦ Every time you run a pipeline or workflow another one fires • At the start • At intervals during execution • At the end of the execution • Reflection: stream data to another pipeline in normalized format ◦ Only supported on “local” pipeline engines for now • The Neo4j Logging perspective ◦ Install Neo4j or docker ◦ Configure a connection ◦ Set the NEO4J_LOGGING_CONNECTION variable
  • 37. © 2022 Neo4j, Inc. All rights reserved. 37 Web services • Execute a pipeline • Output results of a field as a result ◦ Supported only on a local pipeline engine ◦ But … for a Beam engine you can run a workflow • Execute Beam pipeline • Pick up results and output the results
  • 38. © 2022 Neo4j, Inc. All rights reserved. 38 Docker • Great way to run your pipelines and workflows • Starting point for customization • Also capable of running hop-server https://hub.docker.com/r/apache/hop • Documentation: https://hop.apache.org/tech-manual/latest/docker-container.html
  • 39. © 2022 Neo4j, Inc. All rights reserved. 39 Hop Web • Run the Hop GUI in your browser! • The exact same codebase is used • Update the docker image: docker pull apache/hop-web • Run Hop Web Development in dark mode: docker run -p 8080:8080 -e HOP_WEB_THEME=dark apache/hop-web:Development • See also: the Hop Web developers guide
  • 40. © 2022 Neo4j, Inc. All rights reserved. 40 Apache Airflow • The Hop community is working on specific Airflow Operators • For now: use a BashOperator to execute from datetime import datetime from airflow import DAG from airflow.operators.bash import BashOperator # Add a space after the shell filename, otherwise this does not work # create_command = "/home/matt/airflow/scripts/run-hop-pipeline.sh " with DAG(dag_id='run_hop_pipeline' , schedule_interval ="@daily", start_date=datetime(2020, 1, 1), catchup=False) as dag: bash_task = BashOperator(task_id ='bash_task', bash_command=create_command)
  • 41. © 2022 Neo4j, Inc. All rights reserved. 41 Apache Airflow • This script runs a Beam sample pipeline using the Direct runner #!/bin/bash # Change to the Hop install folder # cd /opt/hop # Set the configuration folder # export HOP_CONFIG_FOLDER=/opt/hop/config # And the audit folder # export HOP_AUDIT_FOLDER=/opt/hop/audit # Run a Beam pipeline from the samples with the Direct runner # sh hop-run.sh -j samples -f beam/pipelines/input-process-output.hpl -r Direct
  • 42. © 2022 Neo4j, Inc. All rights reserved. Kubernetes Run a pipeline on AWS EKS using the Apache Flink k8s operator. →Automatically starts an Apache Flink cluster →When the cluster is online, it starts an Apache Hop Pipeline
  • 44. © 2022 Neo4j, Inc. All rights reserved. What’s next? ● Improvements & bug fixes ● Expand integration test suite ● A new welcome dialog with direct links and guided actions ● A new configuration perspective in the Hop GUI ● Give plugins access to the execution information ⇒ Build a Neo4j tab with paths to errors etc. ● Neo4j Cypher Builder transform ● ETA for 2.2.0 is mid-november (3-4 weeks from now)
  • 45. 45 Thank you! Any questions? Join the Hop community: http://hop.apache.org
  • 46. © 2022 Neo4j, Inc. All rights reserved. 46 Join the Hop Community • Website: hop.apache.org • File a bug or feature request in JIRA • Mattermost chat server • The source code on github. • Jenkins Hop builds • • Twitter: https://twitter.com/apachehop • Youtube: https://youtube.com/apachehop • Linkedin: https://www.linkedin.com/company/apachehop/ Welcome!