SlideShare a Scribd company logo
Real-Time BI in
        Hadoop
              Bradford Stephens

       Lead Engineer, Visible Technologies
Principal Consultant, Drawn to Scale Consulting
Topics

• Scalability and BI
• Costs and Abilities
• Search as BI
Real Time BI with Hadoop
Real Time BI with Hadoop
Real Time BI with Hadoop
What Is BI?
Real Time BI with Hadoop
What is “Real-Time”


• Understanding Latency
• We aim for <5 secs.
Real Time BI with Hadoop
Scalability in BI

• Scalbility matters now
• Social Media: Catalyst
• All data is important
• Data doesn’t scale with business size any
  more
Search as BI


• Katta = Distributed Search on Haddoop
• Bobo = Faceted Lucene
Real Time BI with Hadoop
Real Time BI with Hadoop
Real Time BI with Hadoop
Real Time BI with Hadoop
Real Time BI with Hadoop
Doing it Cheap

• 100 TB, Structured and Unstructured
• Oracle- $100,000,000
• “NewSQL” - $4,000,000
• Hadoop + Katta - $250,000
Why We Need Hadoop

• Need to process high-latency data to get
  the “small stuff” fast
• Robust Ecosystem
• Need more than SQL. RDBMS not a Swiss-
  Army Knife
Aggregation is Real-
        Time

• Distributed Search w/ Katta + Facets =
  Aggregation-Based BI
• Sum, Count, Filter, Avg, Group
Protips: Review

• Understand High vs. Low Latency data
• Hadoop makes it cheap
• Pre-aggregate w/ Hadoop, Explore w/ Katta
  + Faceted Search
The Future


• Search/BI as a Platform: “Google my Data
  Warehouse”
• Real-Time MR on HBase

More Related Content

What's hot (17)

PDF
Netflix: Using Big Data in the Cloud to Drive Engagement
Coy Dean
 
PPTX
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
DATAVERSITY
 
PPTX
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole
 
PPTX
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Looker
 
PPTX
Tips in migrating to SharePoint 2016 or O365, to avoid a migration headache
Mike Maadarani
 
PDF
The Future of Content is Real-Time: Leveraging Artificial Intelligence to Del...
Karl Johnson, MBA
 
PDF
Apache Cassandra: NoSQL in the enterprise
jbellis
 
PPTX
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Kurt Brown
 
PPSX
Know and Grow Alignment 1
Mahmoud M. Selim
 
PDF
Big data from the trenches
Azrul MADISA
 
PDF
The practice of big data - making big data approachable
kcmallu
 
PDF
Kyvos insights
rebeccatho
 
PPTX
Build your first spark big data environment in azure
Diego Nogare
 
PDF
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 
PPTX
Big Data Science Challenges in Media
Chandan Rajah
 
PPTX
Ran Rothschild - CloudZone
Idan Tohami
 
PPTX
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
ArabNet ME
 
Netflix: Using Big Data in the Cloud to Drive Engagement
Coy Dean
 
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
DATAVERSITY
 
Qubole presentation for the Cleveland Big Data and Hadoop Meetup
Qubole
 
Webinar with SnagAJob, HP Vertica and Looker - Data at the speed of busines s...
Looker
 
Tips in migrating to SharePoint 2016 or O365, to avoid a migration headache
Mike Maadarani
 
The Future of Content is Real-Time: Leveraging Artificial Intelligence to Del...
Karl Johnson, MBA
 
Apache Cassandra: NoSQL in the enterprise
jbellis
 
Netflix - Elevating Your Data Platform - TDWI Keynote - San Diego 2015
Kurt Brown
 
Know and Grow Alignment 1
Mahmoud M. Selim
 
Big data from the trenches
Azrul MADISA
 
The practice of big data - making big data approachable
kcmallu
 
Kyvos insights
rebeccatho
 
Build your first spark big data environment in azure
Diego Nogare
 
Gab Genai Cloudera - Going Beyond Traditional Analytic
IntelAPAC
 
Big Data Science Challenges in Media
Chandan Rajah
 
Ran Rothschild - CloudZone
Idan Tohami
 
Evolution from Apache Hadoop to the Enterprise Data Hub by Cloudera - ArabNet...
ArabNet ME
 

Viewers also liked (20)

PPTX
ETL big data with apache hadoop
Maulik Thaker
 
PPTX
Omid: A Transactional Framework for HBase
DataWorks Summit/Hadoop Summit
 
PDF
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PDF
Making the leap to BI on Hadoop by Mariani, dave @ atscale
Tin Ho
 
PPTX
Using Hadoop for Cognitive Analytics
DataWorks Summit/Hadoop Summit
 
PPTX
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
PPTX
The Path to Wellness through Big Data
DataWorks Summit/Hadoop Summit
 
PPTX
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
PPTX
What the #$* is a Business Catalog and why you need it
DataWorks Summit/Hadoop Summit
 
PDF
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
PPTX
HIPAA Compliance in the Cloud
DataWorks Summit/Hadoop Summit
 
PPTX
Real Time Machine Learning Visualization with Spark
DataWorks Summit/Hadoop Summit
 
PPTX
Open Source Ingredients for Interactive Data Analysis in Spark
DataWorks Summit/Hadoop Summit
 
PDF
Machine Learning for Any Size of Data, Any Type of Data
DataWorks Summit/Hadoop Summit
 
PPTX
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
 
PPTX
Extreme Analytics @ eBay
DataWorks Summit/Hadoop Summit
 
PPTX
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
PDF
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
 
PPTX
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
PPTX
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
ETL big data with apache hadoop
Maulik Thaker
 
Omid: A Transactional Framework for HBase
DataWorks Summit/Hadoop Summit
 
IoT Crash Course Hadoop Summit SJ
Daniel Madrigal
 
Making the leap to BI on Hadoop by Mariani, dave @ atscale
Tin Ho
 
Using Hadoop for Cognitive Analytics
DataWorks Summit/Hadoop Summit
 
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
The Path to Wellness through Big Data
DataWorks Summit/Hadoop Summit
 
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
What the #$* is a Business Catalog and why you need it
DataWorks Summit/Hadoop Summit
 
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
HIPAA Compliance in the Cloud
DataWorks Summit/Hadoop Summit
 
Real Time Machine Learning Visualization with Spark
DataWorks Summit/Hadoop Summit
 
Open Source Ingredients for Interactive Data Analysis in Spark
DataWorks Summit/Hadoop Summit
 
Machine Learning for Any Size of Data, Any Type of Data
DataWorks Summit/Hadoop Summit
 
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
 
Extreme Analytics @ eBay
DataWorks Summit/Hadoop Summit
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ
Daniel Madrigal
 
The Future of Apache Hadoop an Enterprise Architecture View
DataWorks Summit/Hadoop Summit
 
Accelerating Data Warehouse Modernization
DataWorks Summit/Hadoop Summit
 
Ad

Similar to Real Time BI with Hadoop (20)

PPTX
Strata Online_road_to_enterprise_data_2011
Lynn Langit
 
PPTX
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Caserta
 
PPTX
Understanding Big Data for policy professionals
Alex Jouravlev
 
PDF
Accelerating analytics in a new era of data
Arnon Shimoni
 
PPTX
5 Things that Make Hadoop a Game Changer
Caserta
 
PDF
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
PPTX
Big Data and BI Best Practices
Yellowfin
 
PPTX
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Caserta
 
PPTX
Big data and bi best practices slidedeck
Actian Corporation
 
PPTX
Unlocking the Power of the Data Lake
Arcadia Data
 
PDF
Blueprint for integrating big data analytics and bi
DataWorks Summit
 
PDF
Big data and mstr bridge the elephant
Kognitio
 
KEY
Make Life Suck Less (Building Scalable Systems)
Bradford Stephens
 
KEY
Make Life Suck Less (Building Scalable Systems)
guest0f8e278
 
PDF
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Rittman Analytics
 
PPTX
Big Data Strategy for the Relational World
Andrew Brust
 
PPTX
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
PPTX
Retail & CPG
Tata Consultancy Services
 
KEY
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Bradford Stephens
 
PPTX
Big Data Technologies and Why They Matter To R Users
Adaryl "Bob" Wakefield, MBA
 
Strata Online_road_to_enterprise_data_2011
Lynn Langit
 
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Caserta
 
Understanding Big Data for policy professionals
Alex Jouravlev
 
Accelerating analytics in a new era of data
Arnon Shimoni
 
5 Things that Make Hadoop a Game Changer
Caserta
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
Big Data and BI Best Practices
Yellowfin
 
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Caserta
 
Big data and bi best practices slidedeck
Actian Corporation
 
Unlocking the Power of the Data Lake
Arcadia Data
 
Blueprint for integrating big data analytics and bi
DataWorks Summit
 
Big data and mstr bridge the elephant
Kognitio
 
Make Life Suck Less (Building Scalable Systems)
Bradford Stephens
 
Make Life Suck Less (Building Scalable Systems)
guest0f8e278
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Rittman Analytics
 
Big Data Strategy for the Relational World
Andrew Brust
 
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Building a Business on Hadoop, HBase, and Open Source Distributed Computing
Bradford Stephens
 
Big Data Technologies and Why They Matter To R Users
Adaryl "Bob" Wakefield, MBA
 
Ad

Recently uploaded (20)

PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Python basic programing language for automation
DanialHabibi2
 
July Patch Tuesday
Ivanti
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 

Real Time BI with Hadoop

  • 1. Real-Time BI in Hadoop Bradford Stephens Lead Engineer, Visible Technologies Principal Consultant, Drawn to Scale Consulting
  • 2. Topics • Scalability and BI • Costs and Abilities • Search as BI
  • 8. What is “Real-Time” • Understanding Latency • We aim for <5 secs.
  • 10. Scalability in BI • Scalbility matters now • Social Media: Catalyst • All data is important • Data doesn’t scale with business size any more
  • 11. Search as BI • Katta = Distributed Search on Haddoop • Bobo = Faceted Lucene
  • 17. Doing it Cheap • 100 TB, Structured and Unstructured • Oracle- $100,000,000 • “NewSQL” - $4,000,000 • Hadoop + Katta - $250,000
  • 18. Why We Need Hadoop • Need to process high-latency data to get the “small stuff” fast • Robust Ecosystem • Need more than SQL. RDBMS not a Swiss- Army Knife
  • 19. Aggregation is Real- Time • Distributed Search w/ Katta + Facets = Aggregation-Based BI • Sum, Count, Filter, Avg, Group
  • 20. Protips: Review • Understand High vs. Low Latency data • Hadoop makes it cheap • Pre-aggregate w/ Hadoop, Explore w/ Katta + Faceted Search
  • 21. The Future • Search/BI as a Platform: “Google my Data Warehouse” • Real-Time MR on HBase