SlideShare a Scribd company logo
Fast Data Overview for Data Science Maryland Meetup
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Fast Data Open Source
An Overview
Chuck Scyphers
Big Data Lead
East Coast
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Agenda
4
Fast Data Definition
Popular Open Source Platforms
Refreshments
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Agenda
5
Fast Data Definition
Popular Open Source Platforms
Refreshments
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What Is Fast Data?
6
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
What Is Fast Data?
“Fast data is the application of big data analytics to smaller data sets in near-
real or real-time in order to solve a problem or create business value. The
goal of fast data is to quickly gather and mine structured and unstructured
data so that action can be taken.”
7
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
New Concepts for a Modern Data Platform Architecture
Polyglot
Fit for Purpose Data
Lambda
Speed Layer
Batch Layer
Data
Sources
Data
Services
Kappa
Data
Services
Data PipelineData
Sources
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
How Data Impacts The Organization
9
67%
executives who say
drawing intelligence
from data is top priority
Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
How Data Impacts The Organization
10
89%
executives who would
grade themselves C or
lower in preparedness
Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
How Data Impacts The Organization
11
% believe their organization is losing
revenue as a result of not being able
to fully leverage information
Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Velocity Matters
of executives say
too much critical
information is
delivered too late
Source: Aberdeen Group – January 2012, survey of 247 executives - Data Management for BI – Big Data, Bigger Insight, Superior Performance
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Why Do We Care?
 It’s about getting more from in-flight data
 It’s about faster action, faster insights
 It’s about visibility and predictability
 It’s about running your business in real-time
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Key Value Drivers of Timely Accurate Action
14
Delivering Tangible Results With Fast Data
Higher Quality
In Operations
Improved
Efficiency
New
Services
Better Customer
Experience
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fast Data Is Universal Across Industries
15
Financial Services
Transportation &
Logistics
Telecommunications
Manufacturing &
Retail
Utilities & Oil and GasHealth carePublic Sector
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fast Data Characteristics
16
ANALYZEMOVE &
TRANSFORM
FILTER &
CORRELATE
ACT
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fast Data Characteristics
Oracle Confidential – Internal/Restricted/Highly Restricted 17
ANALYZEMOVE &
TRANSFORM
FILTER &
CORRELATE
ACT
Complete In-Flight Event Processing
• Eliminate, Consolidate, Correlate, And/Or
Filter Data While In Flight
• Analyze Data Streams
• Enrich Data For More Accurate Decisions
• Process Data In The Stream To Free Up
Back End Resources
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fast Data Characteristics
Oracle Confidential – Internal/Restricted/Highly Restricted 18
ANALYZEMOVE &
TRANSFORM
FILTER &
CORRELATE
ACT
Work With The Stream
• Apply Basic Filtering At Capture
• Improve Trusted Quality Of
Information
• Move Data (duh)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fast Data Characteristics
Oracle Confidential – Internal/Restricted/Highly Restricted 19
ANALYZEMOVE &
TRANSFORM
FILTER &
CORRELATE
ACT
Speed Up The OODA Loop
• Get Actionable
Insights
• Predict
Outcomes
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fast Data Characteristics
Oracle Confidential – Internal/Restricted/Highly Restricted 20
ANALYZEMOVE &
TRANSFORM
FILTER &
CORRELATE
ACT
Make Decisions That Matter Faster
• Deliver Real-Time Decisions And
Recommendations To Customers/Employees
• Automatically Render Decisions Within A Process
With Tailored Messaging
• Integrate Human Workflow, Process Management,
Activity Monitoring
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fast Data Customers
Oracle Confidential – Internal/Restricted/Highly Restricted 21
From Oracle (naturally)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fast Data And Financial Services
Oracle Confidential – Internal/Restricted/Highly Restricted 22
Improving Customer Experiences
• Improve Customer Experience: The goal is to
connect all data about the customers to improve
customer service experience and to lower the burden
of hiring new representatives.
• Reduce Staffing Demands: For customers calling to
discuss a claim or their coverage, it means fewer
annoying waits as an agent accesses data from any of
dozens of different places.
• Consolidate information in real-time: All a
customer’s transactions: claims, records, status,
possible cross-sell information (e.g., if someone lives in
an apartment and might need renter’s insurance)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fast Data And Retail
Oracle Confidential – Internal/Restricted/Highly Restricted 23
• Price optimization - leveraging analytics to price
goods and services on the fly based on real-time
metrics such as competitor pricing, supply chain
and inventory data, market data and consumer
behavior data.
• Product placement analysis - processing video
data to identify shopping trends, assesses
effectiveness of displays to improve store layouts
and product placements.
• Staffing - The largest retailers are analyzing
weather forecasts, promotional campaigns and
dates to effectively meet staffing requirements on
holidays all year round.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fast Data And Public Sector
Oracle Confidential – Internal/Restricted/Highly Restricted 24
LA City Planning And Traffic Analysis
• Dynamic Pricing for Toll Lanes: if a driver is paying to drive in the
HOT (high-occupancy tolling) lane, he’s guaranteed a consistent
speed of 45 miles per hour. If traffic starts backing up, prices for
individual cars will rise to discourage them from entering, saving
the lanes for high-occupancy vehicles
• Express Park: It’s not enough to know how to set the price, you
have to make sure that data gets to users in real time. Drivers also
need to know parking spots will still be there when they arrive in
40 minutes.
• Combining M2M: The answer lies in combining information from
other sources, such as mass-transit systems, toll highways, traffic
sensors and weather data to paint a real-time picture of what
traffic actually looks like
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Fast Data And Telecomm
Oracle Confidential – Internal/Restricted/Highly Restricted 25
Location Based Mobile Billboard Advertising at Turkcell
• Processing over 800,000 subscriber
related events per second (with 1.5
Billion Events Daily)
• Provided and executed over 50
simultaneous campaigns
• Ensured customer responsiveness
with less than 1 second times with
a scalable architecture, ready to
expand on demand
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Agenda
26
Fast Data Definition
Popular Open Source Platforms
What Do We Want To Be When We Grow Up?
Refreshments
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
HDFS Based
• Spark
• HBase
• Impala
• H20
• Apex
Other Based
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
27
Composite
• SMACK
• PANCAKE
Open Source Platforms
General Classifications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
HDFS Based
• Spark
• HBase
• Impala
• H20
• Apex
Other Based
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
28
Composite
• SMACK
• PANCAKE
Open Source Platforms
General Classifications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Spark
HDFS Based
HDFS Based
• Spark
• HBase
• Impala
• H20
• Apex
29
• In-Memory Distributed Processing Framework
• Will Spill To Disk As Needed
• Handles Streaming Data Through Micro-batching
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
HBase
HDFS Based
HDFS Based
• Spark*
• HBase
• Impala
• H20
• Apex
30
• A NoSQL Columnar Store Built On Top Of HDFS
• Provides A Big Table–esque Processing Model
• Compression
• In-memory
• Bloom Filters By Column
• Offers Both Real Time Read/Write Access
And Random Access To HDFS
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Impala
HDFS Based
HDFS Based
• Spark*
• HBase
• Impala
• H20
• Apex
31
• Real-time SQL queries over data stored
in HDFS or HBase
• No MapReduce processing
• Uses a MPP query engine on the Hadoop cluster
• Utilizes Hive metastore for metadata repository
• Leveraged by numerous BI tools and applications
• Not ANSI SQL
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
H20
HDFS Based
HDFS Based
• Spark*
• HBase
• Impala
• H20
• Apex
32
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Apex
HDFS Based
HDFS Based
• Spark*
• HBase
• Impala
• H20
• Apex
33
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
HDFS Based
• Spark*
• HBase
• Impala
• H20
• Apex
Other Based
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
34
Composite
• SMACK
• PANCAKE
Open Source Platforms
General Classifications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
HDFS Based
• Spark
• HBase
• Impala
• H20
• Apex
Other Based
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
35
Composite
• SMACK
• PANCAKE
Open Source Platforms
General Classifications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Other Based
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
36
Druid
MySQL
Zookeeper
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Other Based
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
37
Flink
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Other Based
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
38
Storm
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Other Based
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
39
Kafka
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Samza
Other Based
40
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Search Based
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
41
ElasticSearch
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Search Based
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
42
Lucene/Solr
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
A Quick Comparison
43
Guarantee Throughput
Fault
Tolerance
Overhead
Computation
Model Windowing
Memory
Management
DAG
Based
Batch
Support Latency
Stateful
Operations
Spark Exactly Once
100k+
records/sec
Low Microbatches Time Based
Moving towards
automatic
yes Yes seconds yes
Flink Exactly Once Low
Continuous
Flow Operation
Record Based
/ User
Defined
Automatic Yes milliseconds
Storm
At least
Once/Exactly
Once (+ Trident)
100k+
records/sec
Continuous
Flow Operation
yes
No (unless
paired with
Trident)
milliseconds
no (unless
with Trident)
Samza At least Once 10k+ records/sec
Continuous
Flow Operation
milliseconds yes
Hadoop Lower High Batch Only Nope YARN is helping No Only
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
HDFS Based
• Spark
• HBase
• Impala
• H20
• Apex
Other Based
• Druid
• Flink
• Storm
• Kafka
• Samza
• ElasticSearch
• Lucene/Solr
44
Composite
• SMACK
• PANCAKE
Open Source Platforms
General Classifications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
HDFS Based
• Spark
• HBase
• Impala
• H20
• Apex
• Samza
Other Based
• Druid
• Flink
• ElasticSearch
• Storm
• Kafka
• Lucene/Solr
45
Composite
• SMACK
• PANCAKE
Open Source Platforms
General Classifications
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Open Source Platforms
• SMACK
• PANCAKE
46
SMACK Stack
Spark
Mesos
Akka
Cassandra
Kafka
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Open Source Platforms
• SMACK
• PANCAKE
47
PANCAKE Pile
Presto
Arrow
NiFi
Cassandra
AirFlow
Kafka
Elastic Search
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Open Source Platforms
• SMACK
• PANCAKE
48
PANCAKE STACK
Presto
Arrow
NiFi
Cassandra
AirFlow
Kafka
ElasticSearch
Spark
TensorFlow
Algebird
CoreNLP
Kibana
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 49
Fast Data Overview for Data Science Maryland Meetup

More Related Content

What's hot (20)

PDF
Oracle analytics cloud overview feb 2017
aioughydchapter
 
PPTX
How Universities Use Big Data to Transform Education
Hortonworks
 
PDF
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
PPTX
The Power of your Data Achieved - Next Gen Modernization
Hortonworks
 
PPTX
Journey to SAS Analytics Grid with SAS, R, Python
Sumit Sarkar
 
PDF
Oracle Enterprise Metadata Management
Andrey Akulov
 
PDF
Oracle analytics Live September 2021
Benjamin Arnulf
 
PDF
NZOUG - GroundBreakers-2018 -Using Oracle Autonomous Health Framework to Pres...
Sandesh Rao
 
PDF
Dataguise hortonworks insurance_feb25
Hortonworks
 
PDF
ODA Target Markets – Partnering to Win
MarketingArrowECS_CZ
 
PDF
Actian forrester- hortonworks
Hortonworks
 
PPTX
Oracle's BigData solutions
Swiss Big Data User Group
 
PDF
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Hortonworks
 
PDF
Tips and Techniques for Improving the Performance of Validation Procedures in...
Perficient, Inc.
 
PDF
P6 Resource Management in the web
p6academy
 
PPTX
Extreme Analytics - What's New With Oracle Exalytics X3-4 & T5-8?
KPI Partners
 
PDF
Oracle Solaris Build and Run Applications Better on 11.3
OTN Systems Hub
 
PPTX
Hortonworks Oracle Big Data Integration
Hortonworks
 
PPTX
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit
 
PDF
The Next Generation of Big Data Analytics
Hortonworks
 
Oracle analytics cloud overview feb 2017
aioughydchapter
 
How Universities Use Big Data to Transform Education
Hortonworks
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
The Power of your Data Achieved - Next Gen Modernization
Hortonworks
 
Journey to SAS Analytics Grid with SAS, R, Python
Sumit Sarkar
 
Oracle Enterprise Metadata Management
Andrey Akulov
 
Oracle analytics Live September 2021
Benjamin Arnulf
 
NZOUG - GroundBreakers-2018 -Using Oracle Autonomous Health Framework to Pres...
Sandesh Rao
 
Dataguise hortonworks insurance_feb25
Hortonworks
 
ODA Target Markets – Partnering to Win
MarketingArrowECS_CZ
 
Actian forrester- hortonworks
Hortonworks
 
Oracle's BigData solutions
Swiss Big Data User Group
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Hortonworks
 
Tips and Techniques for Improving the Performance of Validation Procedures in...
Perficient, Inc.
 
P6 Resource Management in the web
p6academy
 
Extreme Analytics - What's New With Oracle Exalytics X3-4 & T5-8?
KPI Partners
 
Oracle Solaris Build and Run Applications Better on 11.3
OTN Systems Hub
 
Hortonworks Oracle Big Data Integration
Hortonworks
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark Summit
 
The Next Generation of Big Data Analytics
Hortonworks
 

Similar to Fast Data Overview for Data Science Maryland Meetup (20)

PPTX
Fast Data Overview
C. Scyphers
 
PPTX
Tame Big Data with Oracle Data Integration
Michael Rainey
 
PDF
Demo intelligent user experience with oracle mobility for publishing
Vasily Demin
 
PDF
Cómo terminar tu Planeación Financiera antes de las 6PM
OracleOfficeOfFinance
 
PDF
Contexti / Oracle - Big Data : From Pilot to Production
Contexti
 
PDF
Tapping into the Big Data Reservoir (CON7934)
Jeffrey T. Pollock
 
PDF
8 from zero to insight with real time big data
Dr. Wilfred Lin (Ph.D.)
 
PDF
Big Data: Myths and Realities
Toronto-Oracle-Users-Group
 
PDF
Business Advantages of Oracle Software & Systems Running Together
Mario Derba
 
PDF
Hyperion Planning: Cloud or On Premise
OAUGNJ
 
PDF
Primavera Mobile Applications - Now and Beyond
p6academy
 
PDF
Oracle Management Cloud
Fabio Batista
 
PDF
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Jeffrey T. Pollock
 
PPTX
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
DataWorks Summit
 
PDF
La importancia del dato en la estrategia de marketing digital
Adigital
 
PDF
ODA Right to use program - Optimalizace IT investice
MarketingArrowECS_CZ
 
PPTX
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
jdijcks
 
PPTX
Oracle BI Big Data and Bics
Darren Grogan
 
PPTX
Agile Development and DevOps in the Oracle Cloud
jeckels
 
PDF
Oracle Big Data Governance Webcast Charts
Jeffrey T. Pollock
 
Fast Data Overview
C. Scyphers
 
Tame Big Data with Oracle Data Integration
Michael Rainey
 
Demo intelligent user experience with oracle mobility for publishing
Vasily Demin
 
Cómo terminar tu Planeación Financiera antes de las 6PM
OracleOfficeOfFinance
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti
 
Tapping into the Big Data Reservoir (CON7934)
Jeffrey T. Pollock
 
8 from zero to insight with real time big data
Dr. Wilfred Lin (Ph.D.)
 
Big Data: Myths and Realities
Toronto-Oracle-Users-Group
 
Business Advantages of Oracle Software & Systems Running Together
Mario Derba
 
Hyperion Planning: Cloud or On Premise
OAUGNJ
 
Primavera Mobile Applications - Now and Beyond
p6academy
 
Oracle Management Cloud
Fabio Batista
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Jeffrey T. Pollock
 
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
DataWorks Summit
 
La importancia del dato en la estrategia de marketing digital
Adigital
 
ODA Right to use program - Optimalizace IT investice
MarketingArrowECS_CZ
 
Oracle Openworld Presentation with Paul Kent (SAS) on Big Data Appliance and ...
jdijcks
 
Oracle BI Big Data and Bics
Darren Grogan
 
Agile Development and DevOps in the Oracle Cloud
jeckels
 
Oracle Big Data Governance Webcast Charts
Jeffrey T. Pollock
 
Ad

Recently uploaded (20)

PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
July Patch Tuesday
Ivanti
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Python basic programing language for automation
DanialHabibi2
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Ad

Fast Data Overview for Data Science Maryland Meetup

  • 2. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Fast Data Open Source An Overview Chuck Scyphers Big Data Lead East Coast
  • 3. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 3
  • 4. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Agenda 4 Fast Data Definition Popular Open Source Platforms Refreshments
  • 5. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Agenda 5 Fast Data Definition Popular Open Source Platforms Refreshments
  • 6. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What Is Fast Data? 6
  • 7. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | What Is Fast Data? “Fast data is the application of big data analytics to smaller data sets in near- real or real-time in order to solve a problem or create business value. The goal of fast data is to quickly gather and mine structured and unstructured data so that action can be taken.” 7
  • 8. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | New Concepts for a Modern Data Platform Architecture Polyglot Fit for Purpose Data Lambda Speed Layer Batch Layer Data Sources Data Services Kappa Data Services Data PipelineData Sources
  • 9. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | How Data Impacts The Organization 9 67% executives who say drawing intelligence from data is top priority Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012
  • 10. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | How Data Impacts The Organization 10 89% executives who would grade themselves C or lower in preparedness Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012
  • 11. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | How Data Impacts The Organization 11 % believe their organization is losing revenue as a result of not being able to fully leverage information Source: Oracle Research Study - From Overload to Impact: An Industry Scorecard on Big Data Business Challenges, July 2012
  • 12. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Velocity Matters of executives say too much critical information is delivered too late Source: Aberdeen Group – January 2012, survey of 247 executives - Data Management for BI – Big Data, Bigger Insight, Superior Performance
  • 13. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Why Do We Care?  It’s about getting more from in-flight data  It’s about faster action, faster insights  It’s about visibility and predictability  It’s about running your business in real-time
  • 14. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Key Value Drivers of Timely Accurate Action 14 Delivering Tangible Results With Fast Data Higher Quality In Operations Improved Efficiency New Services Better Customer Experience
  • 15. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fast Data Is Universal Across Industries 15 Financial Services Transportation & Logistics Telecommunications Manufacturing & Retail Utilities & Oil and GasHealth carePublic Sector
  • 16. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fast Data Characteristics 16 ANALYZEMOVE & TRANSFORM FILTER & CORRELATE ACT
  • 17. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fast Data Characteristics Oracle Confidential – Internal/Restricted/Highly Restricted 17 ANALYZEMOVE & TRANSFORM FILTER & CORRELATE ACT Complete In-Flight Event Processing • Eliminate, Consolidate, Correlate, And/Or Filter Data While In Flight • Analyze Data Streams • Enrich Data For More Accurate Decisions • Process Data In The Stream To Free Up Back End Resources
  • 18. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fast Data Characteristics Oracle Confidential – Internal/Restricted/Highly Restricted 18 ANALYZEMOVE & TRANSFORM FILTER & CORRELATE ACT Work With The Stream • Apply Basic Filtering At Capture • Improve Trusted Quality Of Information • Move Data (duh)
  • 19. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fast Data Characteristics Oracle Confidential – Internal/Restricted/Highly Restricted 19 ANALYZEMOVE & TRANSFORM FILTER & CORRELATE ACT Speed Up The OODA Loop • Get Actionable Insights • Predict Outcomes
  • 20. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fast Data Characteristics Oracle Confidential – Internal/Restricted/Highly Restricted 20 ANALYZEMOVE & TRANSFORM FILTER & CORRELATE ACT Make Decisions That Matter Faster • Deliver Real-Time Decisions And Recommendations To Customers/Employees • Automatically Render Decisions Within A Process With Tailored Messaging • Integrate Human Workflow, Process Management, Activity Monitoring
  • 21. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fast Data Customers Oracle Confidential – Internal/Restricted/Highly Restricted 21 From Oracle (naturally)
  • 22. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fast Data And Financial Services Oracle Confidential – Internal/Restricted/Highly Restricted 22 Improving Customer Experiences • Improve Customer Experience: The goal is to connect all data about the customers to improve customer service experience and to lower the burden of hiring new representatives. • Reduce Staffing Demands: For customers calling to discuss a claim or their coverage, it means fewer annoying waits as an agent accesses data from any of dozens of different places. • Consolidate information in real-time: All a customer’s transactions: claims, records, status, possible cross-sell information (e.g., if someone lives in an apartment and might need renter’s insurance)
  • 23. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fast Data And Retail Oracle Confidential – Internal/Restricted/Highly Restricted 23 • Price optimization - leveraging analytics to price goods and services on the fly based on real-time metrics such as competitor pricing, supply chain and inventory data, market data and consumer behavior data. • Product placement analysis - processing video data to identify shopping trends, assesses effectiveness of displays to improve store layouts and product placements. • Staffing - The largest retailers are analyzing weather forecasts, promotional campaigns and dates to effectively meet staffing requirements on holidays all year round.
  • 24. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fast Data And Public Sector Oracle Confidential – Internal/Restricted/Highly Restricted 24 LA City Planning And Traffic Analysis • Dynamic Pricing for Toll Lanes: if a driver is paying to drive in the HOT (high-occupancy tolling) lane, he’s guaranteed a consistent speed of 45 miles per hour. If traffic starts backing up, prices for individual cars will rise to discourage them from entering, saving the lanes for high-occupancy vehicles • Express Park: It’s not enough to know how to set the price, you have to make sure that data gets to users in real time. Drivers also need to know parking spots will still be there when they arrive in 40 minutes. • Combining M2M: The answer lies in combining information from other sources, such as mass-transit systems, toll highways, traffic sensors and weather data to paint a real-time picture of what traffic actually looks like
  • 25. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Fast Data And Telecomm Oracle Confidential – Internal/Restricted/Highly Restricted 25 Location Based Mobile Billboard Advertising at Turkcell • Processing over 800,000 subscriber related events per second (with 1.5 Billion Events Daily) • Provided and executed over 50 simultaneous campaigns • Ensured customer responsiveness with less than 1 second times with a scalable architecture, ready to expand on demand
  • 26. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Agenda 26 Fast Data Definition Popular Open Source Platforms What Do We Want To Be When We Grow Up? Refreshments
  • 27. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | HDFS Based • Spark • HBase • Impala • H20 • Apex Other Based • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr 27 Composite • SMACK • PANCAKE Open Source Platforms General Classifications
  • 28. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | HDFS Based • Spark • HBase • Impala • H20 • Apex Other Based • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr 28 Composite • SMACK • PANCAKE Open Source Platforms General Classifications
  • 29. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Spark HDFS Based HDFS Based • Spark • HBase • Impala • H20 • Apex 29 • In-Memory Distributed Processing Framework • Will Spill To Disk As Needed • Handles Streaming Data Through Micro-batching
  • 30. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | HBase HDFS Based HDFS Based • Spark* • HBase • Impala • H20 • Apex 30 • A NoSQL Columnar Store Built On Top Of HDFS • Provides A Big Table–esque Processing Model • Compression • In-memory • Bloom Filters By Column • Offers Both Real Time Read/Write Access And Random Access To HDFS
  • 31. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Impala HDFS Based HDFS Based • Spark* • HBase • Impala • H20 • Apex 31 • Real-time SQL queries over data stored in HDFS or HBase • No MapReduce processing • Uses a MPP query engine on the Hadoop cluster • Utilizes Hive metastore for metadata repository • Leveraged by numerous BI tools and applications • Not ANSI SQL
  • 32. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | H20 HDFS Based HDFS Based • Spark* • HBase • Impala • H20 • Apex 32
  • 33. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Apex HDFS Based HDFS Based • Spark* • HBase • Impala • H20 • Apex 33
  • 34. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | HDFS Based • Spark* • HBase • Impala • H20 • Apex Other Based • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr 34 Composite • SMACK • PANCAKE Open Source Platforms General Classifications
  • 35. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | HDFS Based • Spark • HBase • Impala • H20 • Apex Other Based • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr 35 Composite • SMACK • PANCAKE Open Source Platforms General Classifications
  • 36. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Other Based • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr 36 Druid MySQL Zookeeper
  • 37. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Other Based • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr 37 Flink
  • 38. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Other Based • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr 38 Storm
  • 39. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Other Based • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr 39 Kafka
  • 40. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Samza Other Based 40 • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr
  • 41. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Search Based • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr 41 ElasticSearch
  • 42. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Search Based • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr 42 Lucene/Solr
  • 43. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | A Quick Comparison 43 Guarantee Throughput Fault Tolerance Overhead Computation Model Windowing Memory Management DAG Based Batch Support Latency Stateful Operations Spark Exactly Once 100k+ records/sec Low Microbatches Time Based Moving towards automatic yes Yes seconds yes Flink Exactly Once Low Continuous Flow Operation Record Based / User Defined Automatic Yes milliseconds Storm At least Once/Exactly Once (+ Trident) 100k+ records/sec Continuous Flow Operation yes No (unless paired with Trident) milliseconds no (unless with Trident) Samza At least Once 10k+ records/sec Continuous Flow Operation milliseconds yes Hadoop Lower High Batch Only Nope YARN is helping No Only
  • 44. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | HDFS Based • Spark • HBase • Impala • H20 • Apex Other Based • Druid • Flink • Storm • Kafka • Samza • ElasticSearch • Lucene/Solr 44 Composite • SMACK • PANCAKE Open Source Platforms General Classifications
  • 45. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | HDFS Based • Spark • HBase • Impala • H20 • Apex • Samza Other Based • Druid • Flink • ElasticSearch • Storm • Kafka • Lucene/Solr 45 Composite • SMACK • PANCAKE Open Source Platforms General Classifications
  • 46. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Open Source Platforms • SMACK • PANCAKE 46 SMACK Stack Spark Mesos Akka Cassandra Kafka
  • 47. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Open Source Platforms • SMACK • PANCAKE 47 PANCAKE Pile Presto Arrow NiFi Cassandra AirFlow Kafka Elastic Search
  • 48. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Open Source Platforms • SMACK • PANCAKE 48 PANCAKE STACK Presto Arrow NiFi Cassandra AirFlow Kafka ElasticSearch Spark TensorFlow Algebird CoreNLP Kibana
  • 49. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 49

Editor's Notes

  • #9: First, the industry has embraced polyglot persistence – accepting and embracing that we should store and manage data in a “fit for purpose” approach that is optimized for the data at hand. Second, that we can parallelize our MPP data foundation for both speed and size, this is crucial for next-gen data services and analytics that can scale to any latency and size requirements. Third, MPP data pipelines that allow us to treat data events in a moving time windows at variable latencies; in the long run this will change how we do ETL for most use cases.
  • #16: The value of applying a fast data strategy as an end-to-end solution has become more apparent across many industry segments; it’s also becoming more mainstream now. We’re seeing many of Oracle customers adopting solutions which address these requirements around fast data; for example: Transportation: Monitor all airline's operational events to eliminate flight delays Retail: Customer service centers are using Fast Data for click-stream analysis and customer experience management. Telco: Location based offers, or Intelligent Network Management to drive new services and lower costs. Healthcare: Monitoring Medical Device Data to help save lives Manufacturing: Real-time corrective action for reducing maintenance costs or risk outages
  • #19: Oracle Data Integration (Oracle GoldenGate and Oracle Data Integrator) provide best-in class real-time capture and big data transformation: Capture changed data and events in real time Apply basic filtering and transformation at capture point Transform and load structured or unstructured data for analysis Improve the trusted quality of information
  • #20: In-memory analytics to provide actionable insight from large amounts of data - fast. Oracle TimesTen In-Memory Database for Oracle Exalytics has also been certified to work with both Oracle GoldenGate Real-Time Replication technology, as well as Oracle Data Integrator, allowing more flexibility for customers to report on events as they happen. With these certifications, Oracle Exalytics data can be updated either via replication or via incremental updates, making the refreshes quicker and more efficient. Trust your data model is current and accurate : customers benefit from the Common enterprise information model provides centralized metadata management, common query request generation and data access, and a rich spectrum of visualization, collaboration, and search features Quickly organize and explore diverse and unstructured Big data from inside and outside your organization – Endeca allows business users to freely explore and discover meaningful new insight from both structured and unstructured sources to help identify root causes and new associations.
  • #21: In-memory analytics to provide actionable insight from large amounts of data - fast. Oracle TimesTen In-Memory Database for Oracle Exalytics has also been certified to work with both Oracle GoldenGate Real-Time Replication technology, as well as Oracle Data Integrator, allowing more flexibility for customers to report on events as they happen. With these certifications, Oracle Exalytics data can be updated either via replication or via incremental updates, making the refreshes quicker and more efficient. Trust your data model is current and accurate : customers benefit from the Common enterprise information model provides centralized metadata management, common query request generation and data access, and a rich spectrum of visualization, collaboration, and search features Quickly organize and explore diverse and unstructured Big data from inside and outside your organization – Endeca allows business users to freely explore and discover meaningful new insight from both structured and unstructured sources to help identify root causes and new associations.
  • #22: In-memory analytics to provide actionable insight from large amounts of data - fast. Oracle TimesTen In-Memory Database for Oracle Exalytics has also been certified to work with both Oracle GoldenGate Real-Time Replication technology, as well as Oracle Data Integrator, allowing more flexibility for customers to report on events as they happen. With these certifications, Oracle Exalytics data can be updated either via replication or via incremental updates, making the refreshes quicker and more efficient. Trust your data model is current and accurate : customers benefit from the Common enterprise information model provides centralized metadata management, common query request generation and data access, and a rich spectrum of visualization, collaboration, and search features Quickly organize and explore diverse and unstructured Big data from inside and outside your organization – Endeca allows business users to freely explore and discover meaningful new insight from both structured and unstructured sources to help identify root causes and new associations.
  • #28: In general, I see three main categories: HDFS ecosystem based, ones that are built on other ecosystems (and/or), and the composite systems
  • #29: In general, I see three main categories: HDFS ecosystem based, ones that are built on other ecosystems (and/or), and the composite systems
  • #33: An effort to make R both more scalable and faster. Can run on top of HDFS, but other platforms as well
  • #34: YARN based processing Includes Malhar, a “lego box” of premade operators and widgets to speed adoption
  • #37: An in memory OLAP store, designed to ingest event/log data, chunking and compressing that data into column-based queryable segments. Data Ingestion Data is ingested by Druid directly through its real-time nodes, or batch-loaded into historical nodes from a deep storage facility. Real-time nodes accept JSON-formatted data from a streaming datasource. Batch-loaded data formats can be JSON, CSV, or TSV. Real-time nodes temporarily store and serve data in real time, but eventually push the data to the deep storage facility, from which it is loaded into historical nodes. Historical nodes hold the bulk of data in the cluster. Real-time nodes chunk data into segments, and are designed to frequently move these segments out to deep storage. To maintain cluster awareness of the location of data, these nodes must interact with MySQL to update metadata about the segments, and with Apache ZooKeeper to monitor their transfer. Query Management Client queries first hit broker nodes, which forward them to the appropriate data nodes (either historical or real-time). Since Druid segments may be partitioned, an incoming query can require data from multiple segments and partitions (or shards) stored on different nodes in the cluster. Brokers are able to learn which nodes have the required data, and also merge partial results before returning the aggregated result. Cluster Management Operations relating to data management in historical nodes are overseen by coordinator nodes, which are the prime users of the MySQL metadata tables. Apache ZooKeeper is used to register all nodes, manage certain aspects of internode communications, and provide for leader elections. Features Low latency (real-time) data ingestion Arbitrary slice and dice data exploration Sub-second analytic queries Approximate and exact computations
  • #38: Flink is intended to be a framework for unified stream and batch process (Kappa architecture). Flink also can handle backpressure on the queues more gracefully than other platforms (::cough:: storm) Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner.[3] Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs.[4][5] Furthermore, Flink's runtime supports the execution of iterative algorithms natively.[6
  • #39: Stream processor only. Bolts & Spouts
  • #41: Samza’s goal is to provide a lightweight framework for continuous data processing. Samza continuously computes results as data arrives which makes sub-second response times possible. It’s unlike batch processing systems such as Hadoop which typically has high-latency responses which can sometimes take hours. Samza might help you to update databases, compute counts or other aggregations, transform messages, or any number of other operations. It’s been in production at LinkedIn for several years and currently runs on hundreds of machines across multiple data centers. Our largest Samza job is processing more than one million messages per-second during peak traffic hours. Architecture & Concepts Streams and Jobs are the building blocks of a Samza application: A stream is composed of immutable sequences of messages of a similar type or category. In order to scale the system to handle large-scale data, we break down each stream into partitions. Within each partition, the sequence of messages is totally ordered and each message’s position is uniquely identified by its offset.  At LinkedIn, streams are provided by Apache Kafka. A  job is the code that consumes and processes a set of input streams. In order to scale the throughput of the stream processor, jobs are broken into smaller units of execution called Tasks. Each task consumes data from one or more partitions for each of the job’s input streams. Since there is no defined ordering of messages across the partitions, it allows tasks to operate independently. Samza assigns groups of tasks to be executed inside one or more containers – UNIX processes running a JVM that execute a set of Samza tasks for a single job. Samza’s container code is single threaded (when one task is processing a message, no other task in the container is active), and is responsible for managing the startup, execution, and shutdown of one or more tasks.
  • #45: In general, I see three main categories: HDFS ecosystem based, ones that are built on other ecosystems (and/or), and the composite systems
  • #47: SPARK – fast, general purpose engine for distributed processing (everywhere) MESOS – cluster management and resource isolation (blue hexagon) AKKA – runtime for highly concurrent, distributed, message-driven applications (blue triangle) CASSANDRA – distributed NoSQL store optimized for reads (eye) KAFKA – high throughput, low latency pub/sub messaging system (orange pipe)