SlideShare a Scribd company logo
Streaming	Transformations	Using	Oracle	Data	Integration
Michael	Rainey	|	BIWA	Summit	2017
• Michael	Rainey	-	Technical	Advisor	
• Spreading	the	good	word	about	Gluent	

products	with	the	world		
• Oracle	Data	Integration	expertise	
• Oracle	ACE	Director	
• mRainey.co
2
Introduction
we liberate enterprise data
What	is	“Streaming”
• The	processing	and	analysis	of	structured	or	“unstructured”	data	in	real-time
• Why	Streaming?	
• When	speed	(velocity)	of	data	is	key	
• Streaming	data	is	processed	in	“time	windows”,	in	memory,	across	a	cluster	of	servers
• Examples:	
• Calculating	a	retail	buying	opportunity	
• Real-time	cost	calculations	
• IoT	data	analysis
4
What	is	“Streaming”
“Publish-subscribe	messaging	rethought	as	a	distributed	commit	log”
5
Streaming	data	-	Apache	Kafka
Image source: kafka.apache.org/
Enterprise	Data	Bus
6
Enterprise	Data	Bus
6
• Scalable,	fault-tolerant,	high-throughput	stream	processing	
• Spark	Streaming	receives	live	input	data	streams	from	various	sources	
• Continuous	stream	of	data	is	known	as	a	discretized	stream	or	DStream	
• Data	is	divided	into	mini-batches	and	processed	by	the	Spark	engine	
• Operations	such	as	join,	filter,	map,	count,	windowed	computations,	etc	are	used	to	
transform	data	in-flight
7
Stream	processing	-	Apache	Spark
Why	Oracle	Data	Integration?
• Enterprise	has	invested	heavily	in	ODI	and/or	GoldenGate	
• Getting	started	with	development	languages	(Python/pySpark,	Java,	etc)
• Centralized	metadata	management	
• Integrate	with	other	data	sources	using	a	single	interface
• Realized	cost	savings	
• According	to	Gartner,	200%	increase	in	maintenance	costs	when	custom	coding	
(https://www.gartner.com/doc/3432617/does-customcoded-data-integration-stack)
9
Why	Oracle	Data	Integration?
10
Streaming	with	Oracle	Data	Integration
10
Streaming	with	Oracle	Data	Integration
Real-time	
data	
replication
Streaming	
integration:	
OGG	->	Kafka
Streaming	integration:		
Kafka	->	Spark	Streaming
11
Relational	database	transactions	to	Kafka
• GoldenGate	
• …is	non-invasive	
• …has	checkpoints	for	recovery	
• …moves	data	quickly	
• …is	easy	to	setup	
12
Why	GoldenGate	with	Kafka?
• Heterogeneous	sources	and	targets	
• Built	to	integrate	all	data
• Flexibility	
• Reusable	code	templates	

(Knowledge	Modules)	
• Reusable	Mappings	
• ODI	can	adapt	to	your	data	warehouse	-	and	not	the	other	way	around
• Flow	based	mappings
13
Why	Oracle	Data	Integrator	with	Spark	Streaming?
Getting	started	with	streaming	using	
Oracle	Data	Integration
• Standard	GoldenGate	Extract	/	Pump	processes	to	capture	RDBMS	data	
• Replicat	for	Java	parameter	file	&	process	group	created	and	setup	
• Kakfa	Producer	properties	and	Kafka	Handler	configuration	setup
15
Oracle	GoldenGate	for	Big	Data	-	Kafka	Handler	Setup
• Kafka	handler	properties	
• Set	properties	for	how	GoldenGate	interacts	with	Kafka	
• Format,	transaction	vs	operation	mode,	etc
• Kafka	producer	configuration
16
GoldenGate	for	Kafka	setup
http://mrainey.co/ogg-kafka-oow
17
Kafka	and	Oracle	Data	Integrator	setup
17
Kafka	and	Oracle	Data	Integrator	setup
• Create	Model	using	Kafka	Logical	Schema	
• Create	Datastore	
• Similar	to	standard	“File”	

datastore,	define	file	format	and	

setup	columns	
• Only	support	for	CSV	
• Future	formats	may	include	JSON,	Avro,	etc	
• Add	Datastore	to	mapping
18
Kafka	and	Oracle	Data	Integrator
• Create	Spark	Data	Server,	Physical	/	Logical	Schema	
• Set	Hadoop	Data	Server		
• Add	properties,	such	as	checkpointing,	asynchronous	execution	mode,	etc	
• Additional	properties	can	be	added:

http://spark.apache.org/docs/latest/configuration.html
• Spark	Server	is	setup	as	Staging	location	
• Source	Datastore	from	Kafka,	Oracle	DB,	etc	
• Target	Datastore	is	Cassandra,	Oracle	DB,	etc
• Code	generated	by	KM	is	pySpark	
• pySpark	code	can	be	added	to	filters,	joins,	other	components	for	transformations	
• Additional	languages	(Scala,	Java)	may	be	coming	soon
19
Spark	Streaming	and	Oracle	Data	Integrator
20
Spark	Streaming	and	Oracle	Data	Integrator
Enable	the	Streaming	
flag	in	the	Physical	
design	of	a	mapping.
To	generate	Spark	code,	set	the	Execute	
On	Hint	option	to	use	the	Spark	data	
server	as	the	staging	location	for	your	
mapping
Target	IKM	should	not	be	set.	
Spark	generated	code	will	handle	
integration	and	load	into	target.
21
Tracking	the	process
When	executing,	the	process	
will	run	continuously	in	the	
ODI	Operator.
If	the	connection	between	
the	ODI	Agent	and	Spark	
Agent	is	lost,	it	will	
reestablish	itself	after	
recovery.
• Streaming	is	the	“velocity”	in	data.	AKA	“Fast	Data”	
• Oracle	Data	Integrator	and	Oracle	GoldenGate	provide	a	framework	for	
development	and	management	of	data	streaming	processes	
• Big	Data	add-ons	continue	to	support	new	technologies	
• Build	a	streaming	architecture	using	GoldenGate	and	ODI:	
• Metadata	management	
• Integration	of	RDBMS	data	with	“schema	on	read”	data	
• Build	upon	the	skills	in-house	
22
Recap
23
we liberate enterprise data
thank you!

More Related Content

What's hot (20)

PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
PDF
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
PDF
Confluent Enterprise Datasheet
confluent
 
PPTX
Azure Synapse Analytics Overview (r2)
James Serra
 
PPTX
A visual introduction to Apache Kafka
Paul Brebner
 
PDF
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Michael Rainey
 
PDF
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
PDF
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
PDF
Apache Kylin - Balance Between Space and Time
DataWorks Summit
 
PPTX
Apache Knox setup and hive and hdfs Access using KNOX
Abhishek Mallick
 
PDF
Apache kafka performance(latency)_benchmark_v0.3
SANG WON PARK
 
PDF
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
PDF
End-to-end Streaming Between gRPC Services Via Kafka with John Fallows
HostedbyConfluent
 
PPTX
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
PDF
Securing Kafka
confluent
 
PDF
11 palo alto user-id concepts
Mostafa El Lathy
 
PPTX
Apache Kafka
Saroj Panyasrivanit
 
PDF
Disaster Recovery Plans for Apache Kafka
confluent
 
PDF
The basics of fluentd
Treasure Data, Inc.
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
Confluent Enterprise Datasheet
confluent
 
Azure Synapse Analytics Overview (r2)
James Serra
 
A visual introduction to Apache Kafka
Paul Brebner
 
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Michael Rainey
 
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
Apache Kylin - Balance Between Space and Time
DataWorks Summit
 
Apache Knox setup and hive and hdfs Access using KNOX
Abhishek Mallick
 
Apache kafka performance(latency)_benchmark_v0.3
SANG WON PARK
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
End-to-end Streaming Between gRPC Services Via Kafka with John Fallows
HostedbyConfluent
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
Securing Kafka
confluent
 
11 palo alto user-id concepts
Mostafa El Lathy
 
Apache Kafka
Saroj Panyasrivanit
 
Disaster Recovery Plans for Apache Kafka
confluent
 
The basics of fluentd
Treasure Data, Inc.
 

Viewers also liked (19)

PDF
Oracle Data Integrator 12c - Getting Started
Michael Rainey
 
PDF
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Michael Rainey
 
PDF
Building Kafka-powered Activity Stream
Oleksiy Holubyev
 
PDF
Oracle Business Intelligence Applications Migration With Oracle Data Integrat...
Siva Velappan
 
PDF
Oracle data integrator 12c - getting started
Michael Rainey
 
DOCX
Ensayo herramientas ofimáticas
Javy Buenaño
 
PDF
AWQA especial No. 2
Roger Adan Chambi Mayta
 
DOCX
Diseño de tablas
Javy Buenaño
 
PDF
Connells Group leaflet Jan 16 Final
Nick Gatti
 
DOCX
Trabajo de informática n2
Javy Buenaño
 
PPTX
Buscadores de internet
Catalina Pedreros
 
PDF
Directiva n°-3-estimacion-de-costos-de-construc.
briner sanchez huaney
 
PDF
Ejercicios con referencias, filtros9
Javy Buenaño
 
PPT
Oracle BI Applications: Delivering Value Through Rapid Implementations
KPI Partners
 
PPTX
Open Houses in Cheyenne WY for Coldwell Banker The Property Exchange February...
Coldwell Banker The Property Exchange
 
PDF
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration
Michael Rainey
 
DOCX
Octavo blog
Institucion esmeralda
 
PPTX
Iнновацiйнi технологii навчання_у_формуваннi_творчоi__особистостi_учня
Andy Levkovich
 
PDF
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Paris Carbone
 
Oracle Data Integrator 12c - Getting Started
Michael Rainey
 
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Michael Rainey
 
Building Kafka-powered Activity Stream
Oleksiy Holubyev
 
Oracle Business Intelligence Applications Migration With Oracle Data Integrat...
Siva Velappan
 
Oracle data integrator 12c - getting started
Michael Rainey
 
Ensayo herramientas ofimáticas
Javy Buenaño
 
AWQA especial No. 2
Roger Adan Chambi Mayta
 
Diseño de tablas
Javy Buenaño
 
Connells Group leaflet Jan 16 Final
Nick Gatti
 
Trabajo de informática n2
Javy Buenaño
 
Buscadores de internet
Catalina Pedreros
 
Directiva n°-3-estimacion-de-costos-de-construc.
briner sanchez huaney
 
Ejercicios con referencias, filtros9
Javy Buenaño
 
Oracle BI Applications: Delivering Value Through Rapid Implementations
KPI Partners
 
Open Houses in Cheyenne WY for Coldwell Banker The Property Exchange February...
Coldwell Banker The Property Exchange
 
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration
Michael Rainey
 
Iнновацiйнi технологii навчання_у_формуваннi_творчоi__особистостi_учня
Andy Levkovich
 
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Paris Carbone
 
Ad

Similar to Streaming with Oracle Data Integration (20)

PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
PPTX
Assessing New Databases– Translytical Use Cases
DATAVERSITY
 
PDF
Building real time data-driven products
Lars Albertsson
 
PDF
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
PDF
xGem Data Stream Processing
Jorge Hirtz
 
PDF
Cloudian 451-hortonworks - webinar
Hortonworks
 
PDF
Serverless SQL
Torsten Steinbach
 
PPTX
Microsoft Traditional & Modern DW solutions stack Presentation.pptx
RaoMajidSultan
 
PPTX
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
PPTX
From Data to Services at the Speed of Business
Ali Hodroj
 
PDF
Data Pipelines with Spark & DataStax Enterprise
DataStax
 
PDF
Moving Targets: Harnessing Real-time Value from Data in Motion
Inside Analysis
 
PDF
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
Big Data Spain
 
PDF
Next Gen Analytics Going Beyond Data Warehouse
Denodo
 
PDF
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
PPTX
In-Memory Computing Webcast. Market Predictions 2017
SingleStore
 
PDF
Houd controle over uw data
ICT-Partners
 
PPTX
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
PPTX
Top 6 Data Ingestion Tools for Seamless Data Integration
YourTechDiet
 
PPTX
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
SnapLogic
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Assessing New Databases– Translytical Use Cases
DATAVERSITY
 
Building real time data-driven products
Lars Albertsson
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
xGem Data Stream Processing
Jorge Hirtz
 
Cloudian 451-hortonworks - webinar
Hortonworks
 
Serverless SQL
Torsten Steinbach
 
Microsoft Traditional & Modern DW solutions stack Presentation.pptx
RaoMajidSultan
 
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
From Data to Services at the Speed of Business
Ali Hodroj
 
Data Pipelines with Spark & DataStax Enterprise
DataStax
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Inside Analysis
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
Big Data Spain
 
Next Gen Analytics Going Beyond Data Warehouse
Denodo
 
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
In-Memory Computing Webcast. Market Predictions 2017
SingleStore
 
Houd controle over uw data
ICT-Partners
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
Top 6 Data Ingestion Tools for Seamless Data Integration
YourTechDiet
 
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
SnapLogic
 
Ad

More from Michael Rainey (18)

PDF
Data Warehouse - Incremental Migration to the Cloud
Michael Rainey
 
PDF
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Michael Rainey
 
PPTX
SQL on Hadoop for the Oracle Professional
Michael Rainey
 
PPTX
Going Serverless - an Introduction to AWS Glue
Michael Rainey
 
PDF
Offload, Transform, and Present - the New World of Data Integration
Michael Rainey
 
PDF
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Michael Rainey
 
PDF
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
Michael Rainey
 
PDF
Practical Tips for Oracle Business Intelligence Applications 11g Implementations
Michael Rainey
 
PDF
GoldenGate and Oracle Data Integrator - A Perfect Match- Upgrade to 12c
Michael Rainey
 
PPT
Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors
Michael Rainey
 
PPTX
Tame Big Data with Oracle Data Integration
Michael Rainey
 
PDF
Real-time Data Warehouse Upgrade – Success Stories
Michael Rainey
 
PDF
A Picture Can Replace A Thousand Words
Michael Rainey
 
PDF
GoldenGate and Oracle Data Integrator - A Perfect Match...
Michael Rainey
 
PDF
GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
Michael Rainey
 
PPTX
Data warehouse migration to oracle data integrator 11g
Michael Rainey
 
PDF
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration
Michael Rainey
 
PDF
KScope14 - Real-Time Data Warehouse Upgrade - Success Stories
Michael Rainey
 
Data Warehouse - Incremental Migration to the Cloud
Michael Rainey
 
Continuous Data Replication into Cloud Storage with Oracle GoldenGate
Michael Rainey
 
SQL on Hadoop for the Oracle Professional
Michael Rainey
 
Going Serverless - an Introduction to AWS Glue
Michael Rainey
 
Offload, Transform, and Present - the New World of Data Integration
Michael Rainey
 
Oracle GoldenGate and Apache Kafka A Deep Dive Into Real-Time Data Streaming
Michael Rainey
 
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration - Coll...
Michael Rainey
 
Practical Tips for Oracle Business Intelligence Applications 11g Implementations
Michael Rainey
 
GoldenGate and Oracle Data Integrator - A Perfect Match- Upgrade to 12c
Michael Rainey
 
Real-Time Data Replication to Hadoop using GoldenGate 12c Adaptors
Michael Rainey
 
Tame Big Data with Oracle Data Integration
Michael Rainey
 
Real-time Data Warehouse Upgrade – Success Stories
Michael Rainey
 
A Picture Can Replace A Thousand Words
Michael Rainey
 
GoldenGate and Oracle Data Integrator - A Perfect Match...
Michael Rainey
 
GoldenGate and ODI - A Perfect Match for Real-Time Data Warehousing
Michael Rainey
 
Data warehouse migration to oracle data integrator 11g
Michael Rainey
 
A Walk Through the Kimball ETL Subsystems with Oracle Data Integration
Michael Rainey
 
KScope14 - Real-Time Data Warehouse Upgrade - Success Stories
Michael Rainey
 

Recently uploaded (20)

PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
Import Data Form Excel to Tally Services
Tally xperts
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 

Streaming with Oracle Data Integration