SlideShare a Scribd company logo
1
Drag and Drop: Open Source
GeoTools ETL with Apache NiFi
Andrew	Hulbert	โ€“ CCRI
Constantin	Stanca	โ€“ Hortonworks
5/16/2018
2
Andrew Hulbert โ€“ Principal Software Engineer @CCRI, GeoMesa Contributor: andrew.hulbert@ccri.com
Cpnstantin Stanca โ€“ Solutions Engineer @Hortonworks, Data Transformation and Analytics SME: cstanca@hortonworks.com
3
Executive Summary
โฌข Loading	your	GeoTools SimpleFeatures into	your	cloud	or	database	never	seems	as	
easy	as	it	should	be.	There's	Twitter	streams,	FTP	servers,	inboxes,	dropboxes,	and	all	
sorts	of	other	data	that	you	need	to	parse,	convert	to	SimpleFeatures,	and	then	ingest	
into	your	GeoTools datastore.	
โฌข GeoMesa and	NiFi can	provide	a	fully	open	source	solution	to	ease	the	pain	of	
ingesting	data	into	ANY	GeoTools data	store.	It's	as	easy	as	drag	and	drop!	Literally!
โฌข We'll	show	how	real-time	streaming	data	such	as	satellite	AIS	can	be	ingested	and	
managed	in	real-time	using	ingest	pipelines	with	your	web	browser	without	having	to	
compile	any	code.
4
Challenges
โฌข Ancillary:
โ€“ Lots	of	data	stores,	tools	and	frameworks
โ€“ Moving	data	between	systems	(data	tracking	and	data	loss)	โ€“ convoluted	data	flow
โ€“ Configuration	and	Monitoring
โฌข Big	Data:
โ€“ Volume,	Variety,	Velocity
โ€“ More	data	stores,	tools	and	frameworks
โฌข Spatial	Data	Types
5
Spatial Data Types
Points
Locations
Events
Instantaneous
Positions
Lines
Road networks
Voyages
Trips
Trajectories
Polygons
Administrative
Regions
Airspaces
6
Spatial Data Relationships
equals
disjoint
intersects
touches
crosses
within
contains
overlaps
7
โ€œTraditionalโ€ Geo-Spatial ETL
OGC
Simple Features
8
โ€Cloudโ€ Geo-Spatial ETL
OGC
Simple Features
?
9
Solution: Apache NiFi and GeoMesa
OGC
Simple Features
10
What is Apache NiFi?
โฌข Powerful	and	scalable	directed	graphs	of	data	routing,	transformation,	and	system	mediation	logic.	
โฌข Web-based	user	interface
โฌข Highly	configurable
โฌข Data	Provenance
โฌข Designed	for	extension
โฌข Secure
An	open	source	project	dedicated	to	making	dataflow	easy.	It's	as	easy	as	drag	and	drop!	Literally!
11
Convoluted Data Flow
Acquire
Data
Store
Data
Acquire
Data
Store
Data
Store
Data
Store
Data
Store
Data
Process
and
Analyze
Data
Data
Flow
Acquire
Data
Acquire
Data
12
Data	in	
Motion
(Cloud)
Data	in	
Motion
(on-premises)
Data	at	Rest
(on-premises)
Edge	
Data
Data	in	
Motion
Edge	
Analytics
Data	at	Rest
(Cloud)
Edge	
Data
Data	at	Rest
(on-premises)
Closed	
Loop	
Analytics
Machine
Learning
Deep	
Historical
Analysis
Data Flow Transformation with NiFi
On-premises
Cloud
13
200+ Processors
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route	Content
Route	Context
Route	Text
Control	Rate
Distribute	Load
Generate	Table	Fetch
Jolt	Transform	JSON
Prioritized	Delivery
Encrypt
Tail
Evaluate
Execute
HL7
FTP
UDP
XML
SFTP
HTTP
Syslog
Email
HTML
Image
AMQP
MQTT
All	Apache	project	logos	are	trademarks	of	the	ASF	and	the	respective	projects.
Fetch
14
NiFi โ€“ UI and Terms
NiFi Terminology
FlowFile
Unit of data moving through the
system
Content + Attributes (key/value
pairs)
Connection
Links between processors
Queues that can be dynamically
prioritized
Processor
Performs the work, can access
FlowFiles
Process Group
Set of processors and their
connections
Receive data via input ports, send
data via output ports
โ€ข Drag	and	drop	processors	to	
build	a	flow
โ€ข Start,	stop,	and	configure	
components	in	real	time
โ€ข View	errors	and	
corresponding	error	
messages
โ€ข View	statistics	and	health	of	
data	flow
โ€ข Create	templates	of	common	
processor	&	connections
15
NiFi โ€“ Queue Prioritization
โ€ข Configure	a	prioritizer per	
connection
โ€ข Determine	what	is	important	for	
your	data	โ€“ time	based,	arrival	
order,	importance	of	a	data	set
โ€ข Funnel	many	connections	down	to	
a	single	connection	to	prioritize	
across	data	sets
โ€ข Develop	your	own	prioritizer,	if	
needed
16
NiFi โ€“ Provenance
โ€ข Tracks	data	at	each	point	as	it	flows	
through	the	system
โ€ข Records,	indexes,	and	makes	events	
available	for	display
โ€ข Handles	fan-in/fan-out,	i.e.	merging	
and	splitting	data
โ€ข View	attributes	and	content	at	given	
points	in	time
17
NiFi Demo
18
SQL with NiFi Record Processors
19
Reverse GeoLookup
20
What is GeoMesa?
A	suite	of	tools	for	persisting,	querying,	analyzing,	and	streaming	spatio-temporal	data	
at	scale
21
What is GeoMesa?
A	suite	of	tools	for	persisting,	querying,	analyzing,	and	streaming	spatio-temporal	data	
at	scale
22
What is GeoMesa?
A	suite	of	tools	for	persisting,	querying,	analyzing,	and	streaming	spatio-temporal	data	
at	scale
23
What is GeoMesa?
A	suite	of	tools	for	persisting,	querying,	analyzing,	and	streaming	spatio-temporal	data	
at	scale
24
What is GeoMesa?
A	suite	of	tools	for	persisting,	querying,	analyzing,	and	streaming spatio-temporal	data	
at	scale
25
GeoMesa NiFi
โฌข GeoMesa-NiFi allows	you	to	ingest	data	into	GeoMesa straight	from	NiFi by	
leveraging	custom	processors.	
โฌข NiFi allows	you	to	ingest	data	into	GeoMesa from	every	source	GeoMesa supports	
and	more.
Data
SimpleFeatureType
Schema
GeoMesa NiFi
Processors enabled	datastores
26
GeoMesa NiFi Processors
โฌข PutGeoMesaAccumulo:	Ingest	data	into	a	GeoMesa Accumulo datastore with	a	
GeoMesa converter	or	from	geoavro
โฌข PutGeoMesaHBase:	Ingest	data	into	a	GeoMesa HBase datastore with	a	GeoMesa
converter	or	from	geoavro
โฌข PutGeoMesaFileSystem:	Ingest	data	into	a	GeoMesa File	System	datastore with	a	
GeoMesa converter	or	from	geoavro
โฌข PutGeoMesaKafka:	Ingest	data	into	a	GeoMesa Kafka	datastore with	a	GeoMesa
converter	or	from	geoavro
โฌข PutGeoTools:	Ingest	data	into	an	arbitrary	GeoTools datastore using	a	GeoMesa
converter	or	avro
โฌข ConvertToGeoAvro:	Use	a	GeoMesa converter	to	create	geoavro
27
GeoMesa-NiFi Demo
28
NiFi-GeoMesa Data Flow
29
Map in Real-Time
30
Resources:
โฌข GeoMesa Project:	http://www.geomesa.org/
โฌข Geomesa-NiFi:	https://github.com/geomesa/geomesa-nifi
โฌข NiFi Project:	https://nifi.apache.org/
โฌข NiFi Overview	and	Tutorials:	https://hortonworks.com/apache/nifi/
31
Thank you!

More Related Content

What's hot (20)

PPTX
ORC improvement in Apache Spark 2.3
DataWorks Summit
ย 
PPT
8a. How To Setup HBase with Docker
Fabio Fumarola
ย 
PDF
Building large scale transactional data lake using apache hudi
Bill Liu
ย 
PPTX
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
DataScienceConferenc1
ย 
PDF
Big data real time architectures
Daniel Marcous
ย 
PDF
Future of Data Engineering
C4Media
ย 
PDF
[XConf Brasil 2020] Data mesh
ThoughtWorks Brasil
ย 
PPTX
Databricks Platform.pptx
Alex Ivy
ย 
PPTX
Programming in Spark using PySpark
Mostafa
ย 
PDF
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
ThyrixYang1
ย 
PDF
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
ย 
PDF
Nifi
Julio Castro
ย 
PDF
Master the RETE algorithm
Masahiko Umeno
ย 
PPTX
Apache HBaseโ„ข
Prashant Gupta
ย 
PPTX
Managing your Hadoop Clusters with Apache Ambari
DataWorks Summit
ย 
PDF
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Po-Chuan Chen
ย 
PDF
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
ย 
PDF
Summary introduction to data engineering
Novita Sari
ย 
PDF
A Primer on Entity Resolution
Benjamin Bengfort
ย 
PDF
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wรคhner
ย 
ORC improvement in Apache Spark 2.3
DataWorks Summit
ย 
8a. How To Setup HBase with Docker
Fabio Fumarola
ย 
Building large scale transactional data lake using apache hudi
Bill Liu
ย 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
DataScienceConferenc1
ย 
Big data real time architectures
Daniel Marcous
ย 
Future of Data Engineering
C4Media
ย 
[XConf Brasil 2020] Data mesh
ThoughtWorks Brasil
ย 
Databricks Platform.pptx
Alex Ivy
ย 
Programming in Spark using PySpark
Mostafa
ย 
Bert pre_training_of_deep_bidirectional_transformers_for_language_understanding
ThyrixYang1
ย 
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
ย 
Nifi
Julio Castro
ย 
Master the RETE algorithm
Masahiko Umeno
ย 
Apache HBaseโ„ข
Prashant Gupta
ย 
Managing your Hadoop Clusters with Apache Ambari
DataWorks Summit
ย 
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.pdf
Po-Chuan Chen
ย 
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
ย 
Summary introduction to data engineering
Novita Sari
ย 
A Primer on Entity Resolution
Benjamin Bengfort
ย 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wรคhner
ย 

Similar to Drag and Drop Open Source GeoTools ETL with Apache NiFi (20)

PDF
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
"Constantin \"Cristi\"" Stanca
ย 
PPTX
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
DataWorks Summit
ย 
PDF
DSD-INT 2017 The use of big data for dredging - De Boer
Deltares
ย 
PPT
GeoKettle: A powerful open source spatial ETL tool
Thierry Badard
ย 
PPTX
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
ย 
PPTX
Integrating NiFi and Flink
Bryan Bende
ย 
PPTX
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
ย 
PPTX
Integrating Apache NiFi and Apache Flink
Hortonworks
ย 
PPTX
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
ย 
PDF
Nifi workshop
Yifeng Jiang
ย 
PDF
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Mats Johansson
ย 
PPTX
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
ย 
PPTX
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
ย 
PDF
Keynote
DataWorks Summit
ย 
ODP
Mapping, GIS and geolocating data in Java
Joachim Van der Auwera
ย 
PPT
Geographic information system
Sumanta Das
ย 
PPTX
Integrating NiFi and Apex
Bryan Bende
ย 
PDF
Drone Data Flowing Through Apache NiFi
Timothy Spann
ย 
PPTX
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
ย 
PDF
gis.pdf
RaduPopa30
ย 
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
"Constantin \"Cristi\"" Stanca
ย 
High Performance and Scalable Geospatial Analytics on Cloud with Open Source
DataWorks Summit
ย 
DSD-INT 2017 The use of big data for dredging - De Boer
Deltares
ย 
GeoKettle: A powerful open source spatial ETL tool
Thierry Badard
ย 
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
ย 
Integrating NiFi and Flink
Bryan Bende
ย 
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
ย 
Integrating Apache NiFi and Apache Flink
Hortonworks
ย 
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
ย 
Nifi workshop
Yifeng Jiang
ย 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Mats Johansson
ย 
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
ย 
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
ย 
Keynote
DataWorks Summit
ย 
Mapping, GIS and geolocating data in Java
Joachim Van der Auwera
ย 
Geographic information system
Sumanta Das
ย 
Integrating NiFi and Apex
Bryan Bende
ย 
Drone Data Flowing Through Apache NiFi
Timothy Spann
ย 
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
ย 
gis.pdf
RaduPopa30
ย 
Ad

Recently uploaded (20)

PPTX
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
ย 
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Alluxio, Inc.
ย 
PDF
Code and No-Code Journeys: The Maintenance Shortcut
Applitools
ย 
PDF
How AI in Healthcare Apps Can Help You Enhance Patient Care?
Lilly Gracia
ย 
PPTX
MiniTool Partition Wizard Crack 12.8 + Serial Key Download Latest [2025]
filmoracrack9001
ย 
PPTX
prodad heroglyph crack 2.0.214.2 Full Free Download
cracked shares
ย 
PPTX
Get Started with Maestro: Agent, Robot, and Human in Action โ€“ Session 5 of 5
klpathrudu
ย 
PDF
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
ย 
PDF
Introduction to Apache Icebergโ„ข & Tableflow
Alluxio, Inc.
ย 
PPTX
Lec 2 Compiler, Interpreter, linker, loader.pptx
javidmiakhil63
ย 
PDF
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
ย 
PDF
How Attendance Management Software is Revolutionizing Education.pdf
Pikmykid
ย 
PPTX
How Odoo ERP Enhances Operational Visibility Across Your Organization.pptx
pintadoxavier667
ย 
PDF
10 Salesforce Consulting Companies in Sydney.pdf
DianApps Technologies
ย 
PDF
Australian Enterprises Need Project Service Automation
Navision India
ย 
PDF
Instantiations Company Update (ESUG 2025)
ESUG
ย 
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
Alluxio, Inc.
ย 
PDF
chapter 5.pdf cyber security and Internet of things
PalakSharma980227
ย 
PPTX
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
ย 
PPTX
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
ย 
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
ย 
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Alluxio, Inc.
ย 
Code and No-Code Journeys: The Maintenance Shortcut
Applitools
ย 
How AI in Healthcare Apps Can Help You Enhance Patient Care?
Lilly Gracia
ย 
MiniTool Partition Wizard Crack 12.8 + Serial Key Download Latest [2025]
filmoracrack9001
ย 
prodad heroglyph crack 2.0.214.2 Full Free Download
cracked shares
ย 
Get Started with Maestro: Agent, Robot, and Human in Action โ€“ Session 5 of 5
klpathrudu
ย 
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
ย 
Introduction to Apache Icebergโ„ข & Tableflow
Alluxio, Inc.
ย 
Lec 2 Compiler, Interpreter, linker, loader.pptx
javidmiakhil63
ย 
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
ย 
How Attendance Management Software is Revolutionizing Education.pdf
Pikmykid
ย 
How Odoo ERP Enhances Operational Visibility Across Your Organization.pptx
pintadoxavier667
ย 
10 Salesforce Consulting Companies in Sydney.pdf
DianApps Technologies
ย 
Australian Enterprises Need Project Service Automation
Navision India
ย 
Instantiations Company Update (ESUG 2025)
ESUG
ย 
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
Alluxio, Inc.
ย 
chapter 5.pdf cyber security and Internet of things
PalakSharma980227
ย 
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
ย 
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
ย 
Ad

Drag and Drop Open Source GeoTools ETL with Apache NiFi