SlideShare a Scribd company logo
How Klout is changing the
landscape of social media with
Hadoop and BI
Dave Mariani
VP Engineering, Klout


Denny Lee
Principal Program Manager
Microsoft
Discover and be recognized for how you
          influence the world
Klout’s Big Data makes all this possible


   15 Social Networks Processed Every Day
   120 Terabytes of Data Storage
   200,000 Indexed Users Added Every Day
   140,000,000 Users Indexed Every Day
   1,000,000,000 Social Signals Processed Every Day
   30,000,000,000 API Calls Delivered Every Month
   54,000,000,000 Rows of Data In Klout Data Warehouse
                                                         3
KLOUT DATA ARCHITECTURE
                             THE BEST TOOL FOR THE JOB




                                               Registrations DB
                                                                                Klout.com
                                                   (MySql)
                                                                                (Node.js)


                                                                                 Mobile
                                                   Profile DB                  (ObjectiveC)




                                                                  Klout API
                                                                    (Scala)
                                                    (HBase)
   Signal
 Collectors        Data
                                                                               Partner API
(Java/Scala)   Enhancement
                  Engine                                                        (Mashery)
                              Data Warehouse
                (PIG/Hive)                       Search Index
                                   (Hive)
                                                (Elastic
                                                       Search)




                                                    Streams
                                                  (MongoDB)
                                                                               Monitoring
                                                                                (Nagios)

                                               Serving Stores
                                                                               Dashboards
                                                                                (Tableau)

                                                                              Perks Analyics
                                                                                 (Scala)
                                                  Analytics
                                                   Cubes                      Event Tracker
                                                   (SSAS)
                                                                                 (Scala)
What is Business Intelligence?
• Data Warehousing, OLAP, Dashboards, Reporting
• Ability to slice and dice data in an ad-hoc manner
• Getting the right data to the right people, at the right
  time
• i.e. Now




                                                             5
Why Hadoop + BI?




                                        Hadoop     BI
             Requirement                  &       Query
                                         Hive    Engines
  Capture & store all data               Yes       No
  Support queries against detail data    Yes       No
  Support interactive queries &          No        Yes
  applications
  Support BI & visualization tools       No        Yes




                                                           6
An Example: Klout Event Tracker
                                           1   Perform A|B Testing of User Flows

                                           2   Optimize Registration Funnels




3   Monitor consumer engagement & retention (DAUs & MAUs)

4   Flexibly track and report on user generated events
                                                                                   7
A Flexible, Hierarchical Schema


 Project:              Event:         Property Type:    Property Value:
Collection            Captured           Attribute         Attribute
of Events            User Action           Key              Value




HomePage,                               Source,        Google Search
 Actions,                               Gender,            Male
Mobile iOS                              Location            SF
             +K (Add a topic) event
Event Tracker Architecture                     event_log
                                                tstamp string
                                   {            project string
                                   "project":"plusK", string
                                                event
                                                session_id bigint
                                   "event":"spend",
               insights3:9003/track/{"project":”plu
                                                ks_uid bigint
               sK","event":”spend”,"session_id":"0",
                                   Warehouse
                                                ip string
                                   "ip":"50.68.47.158",
               "ks_uid":123456,”type":”add_topic"}
                                                json_keys array<string>
                                   "kloutId":“123456",
                                                json_values
                                   “cookie_id":”123456",
                                                array<string>
                                   "ref":"http://klout.com/",
                                                json_text string
                                   "type":"add_topic",
Tracker API       Log Process                          Cube
                                                dt string            Klout UI
                                   "time":"1338366015"
   Scala,           Flume                             Analysis        Scala,
                                   }            hr string
  node.JS                                             Services       AJAX UX
          SELECT { [Measures].[Counter], [Measures].[PreviousPeriodCounter]}
          ON COLUMNS,
                                           will be saved in HDFS at:
          NON EMPTY CROSSJOIN (            /logs/events_tracking/2012-05-30/0100
          exists([Date].[Date].[Date].allmembers,
          [Date].[Date].&[2012-05-19T00:00:00]:[Date].[Date].&[2012-06-
          02T00:00:00]),
          [Events].[Event].[Event].allmembers ) DIMENSION PROPERTIES
          MEMBER_CAPTION
          ON ROWS
          FROM [ProductInsight]
          WHERE ({[Projects].[Project].[plusK]})


Instrument          Collect           Persist              Query            Report
                                                                                     9
Hadoop & BI Together:
Query Cube using a Custom App




                                10
A peek into product insight >
A|B test : unsorted vs. Sorted




                                 11
A Peek into
Product Insights >
Projects: Mobile
iOS




                     12
13
Hadoop & BI Together:
Query Cube Using Viz App




                           14
15
16
Hadoop & BI Together:
Query Hive using CLI




                        17
HiveQL Example

SELECT
   get_json_object(json_text,'$.sid') as sid,
   get_json_object(json_text,'$.inc') as inc,
   get_json_object(json_text,'$.status') as status,
   event
FROM bi.event_log
WHERE project='mobile-ios'
   AND dt=20120612
   AND get_json_object(json_text,'$.v')<>'1.5'
   AND (event = 'api_error' OR event = 'api_timeout')
ORDER BY sid;
19
Hadoop & BI Together:
Query Hive using Excel




                         20
21
Why Hadoop + BI?




                                        Hadoop     BI
             Requirement                  &       Query
                                         Hive    Engines
  Capture & store all data               Yes       No
  Support queries against detail data    Yes       No
  Support interactive queries &          No        Yes
  applications
  Support BI & visualization tools       No        Yes




                                                           22
Any Questions?




                 23

More Related Content

What's hot (20)

PPTX
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
Sascha Dittmann
 
PDF
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
PDF
What is an Open Data Lake? - Data Sheets | Whitepaper
Vasu S
 
PPTX
An intro to Azure Data Lake
Rick van den Bosch
 
PPTX
Qubole - Big data in cloud
Dmitry Tolpeko
 
PDF
Building Data Intensive Analytic Application on Top of Delta Lakes
Databricks
 
PPTX
Lambda-less Stream Processing @Scale in LinkedIn
DataWorks Summit/Hadoop Summit
 
PDF
Tarun poladi resume
Tarun P
 
PPTX
A Zen Journey to Database Management
Basho Technologies
 
PPTX
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 
PPTX
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 
PDF
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
Mars Lan
 
PPTX
Big Data with SQL Server
Mark Kromer
 
PPTX
Pentaho Analytics on MongoDB
Mark Kromer
 
PPTX
MongoDB & Hadoop - Understanding Your Big Data
MongoDB
 
PPTX
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon
 
PDF
Building a Data Lake on AWS
Gary Stafford
 
PPTX
Azure Lowlands: An intro to Azure Data Lake
Rick van den Bosch
 
PPTX
Big Data in the Real World
Mark Kromer
 
PDF
How a Tweet Went Viral - BIWA Summit 2017
Rittman Analytics
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
Sascha Dittmann
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
What is an Open Data Lake? - Data Sheets | Whitepaper
Vasu S
 
An intro to Azure Data Lake
Rick van den Bosch
 
Qubole - Big data in cloud
Dmitry Tolpeko
 
Building Data Intensive Analytic Application on Top of Delta Lakes
Databricks
 
Lambda-less Stream Processing @Scale in LinkedIn
DataWorks Summit/Hadoop Summit
 
Tarun poladi resume
Tarun P
 
A Zen Journey to Database Management
Basho Technologies
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 
Big Data Analytics Projects - Real World with Pentaho
Mark Kromer
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
Mars Lan
 
Big Data with SQL Server
Mark Kromer
 
Pentaho Analytics on MongoDB
Mark Kromer
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB
 
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon
 
Building a Data Lake on AWS
Gary Stafford
 
Azure Lowlands: An intro to Azure Data Lake
Rick van den Bosch
 
Big Data in the Real World
Mark Kromer
 
How a Tweet Went Viral - BIWA Summit 2017
Rittman Analytics
 

Viewers also liked (15)

PPTX
Aadhaar at 5th_elephant_v3
Regunath B
 
PDF
Klout Score - Understanding Influence, True Reach, Amplification, and Network
Marcus Nelson
 
PDF
Elasticsearch @ Keboola
Martin Halamíček
 
PDF
Tecnologías Semánticas en la Web de Datos
Datos.gob.es
 
PDF
소개서 xtrade(전자무역) system
춘웅 석
 
PPTX
소개서 FTA원산지시스템 솔루션
춘웅 석
 
PDF
Nuevo paradigma en catalogación: El modelo FRBR y las RDA
Universidad de Belgrano
 
PPT
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Cloudera, Inc.
 
PPT
VIII Encuentros de Centros de Documentación de Arte Contemporáneo en Artium -...
Artium Vitoria
 
PDF
Mongodb 특징 분석
Daeyong Shin
 
PPT
MongoDB Basic Concepts
MongoDB
 
PDF
Gestión de redes sociales en la BNE. Ana Carrillo Pozas
Biblioteca Nacional de España
 
PPT
MongoDB Pros and Cons
johnrjenson
 
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Oleksiy Panchenko
 
PDF
Elasticsearch 簡介
Jui An Huang (黃瑞安)
 
Aadhaar at 5th_elephant_v3
Regunath B
 
Klout Score - Understanding Influence, True Reach, Amplification, and Network
Marcus Nelson
 
Elasticsearch @ Keboola
Martin Halamíček
 
Tecnologías Semánticas en la Web de Datos
Datos.gob.es
 
소개서 xtrade(전자무역) system
춘웅 석
 
소개서 FTA원산지시스템 솔루션
춘웅 석
 
Nuevo paradigma en catalogación: El modelo FRBR y las RDA
Universidad de Belgrano
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Cloudera, Inc.
 
VIII Encuentros de Centros de Documentación de Arte Contemporáneo en Artium -...
Artium Vitoria
 
Mongodb 특징 분석
Daeyong Shin
 
MongoDB Basic Concepts
MongoDB
 
Gestión de redes sociales en la BNE. Ana Carrillo Pozas
Biblioteca Nacional de España
 
MongoDB Pros and Cons
johnrjenson
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Oleksiy Panchenko
 
Elasticsearch 簡介
Jui An Huang (黃瑞安)
 
Ad

Similar to How Klout is changing the landscape of social media with Hadoop and BI (20)

PDF
Klout changing landscape of social media
DataWorks Summit
 
PDF
Hadoop - Now, Next and Beyond
Teradata Aster
 
PDF
제1회 Korea Community Day 발표자료 Bigdata
Gruter
 
PDF
Transition from relational to NoSQL Philly DAMA Day
Dipti Borkar
 
PDF
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Dipti Borkar
 
PDF
Treasure Data: Big Data Analytics on Heroku
Salesforce Developers Japan
 
PDF
Experiences Streaming Analytics at Petabyte Scale
DataWorks Summit
 
PPT
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Treasure Data, Inc.
 
PPTX
Introduction to Apache Drill
Swiss Big Data User Group
 
PDF
Introduction to NoSQL and Couchbase
Dipti Borkar
 
PPT
SQL on Big Data using Optiq
Julian Hyde
 
PPTX
Hadoop Summit - Hausenblas 20 March
MapR Technologies
 
PPTX
Understanding the Value and Architecture of Apache Drill
DataWorks Summit
 
PDF
Ensuring Mobile BI Success
Birst
 
PPTX
Galaxy of bits
Michal Zylinski
 
PDF
Cascading: Enterprise Data Workflows based on Functional Programming
Paco Nathan
 
PDF
Common MongoDB Use Cases
DATAVERSITY
 
PDF
Hadoop's Opportunity to Power Next-Generation Architectures
DataWorks Summit
 
PPTX
Common MongoDB Use Cases Webinar
MongoDB
 
PDF
Common MongoDB Use Cases
DATAVERSITY
 
Klout changing landscape of social media
DataWorks Summit
 
Hadoop - Now, Next and Beyond
Teradata Aster
 
제1회 Korea Community Day 발표자료 Bigdata
Gruter
 
Transition from relational to NoSQL Philly DAMA Day
Dipti Borkar
 
Navigating the Transition from relational to NoSQL - CloudCon Expo 2012
Dipti Borkar
 
Treasure Data: Big Data Analytics on Heroku
Salesforce Developers Japan
 
Experiences Streaming Analytics at Petabyte Scale
DataWorks Summit
 
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Treasure Data, Inc.
 
Introduction to Apache Drill
Swiss Big Data User Group
 
Introduction to NoSQL and Couchbase
Dipti Borkar
 
SQL on Big Data using Optiq
Julian Hyde
 
Hadoop Summit - Hausenblas 20 March
MapR Technologies
 
Understanding the Value and Architecture of Apache Drill
DataWorks Summit
 
Ensuring Mobile BI Success
Birst
 
Galaxy of bits
Michal Zylinski
 
Cascading: Enterprise Data Workflows based on Functional Programming
Paco Nathan
 
Common MongoDB Use Cases
DATAVERSITY
 
Hadoop's Opportunity to Power Next-Generation Architectures
DataWorks Summit
 
Common MongoDB Use Cases Webinar
MongoDB
 
Common MongoDB Use Cases
DATAVERSITY
 
Ad

More from Denny Lee (20)

PDF
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Denny Lee
 
PPTX
Spark to DocumentDB connector
Denny Lee
 
PPTX
Introduction to Azure DocumentDB
Denny Lee
 
PPTX
SQL Server Integration Services Best Practices
Denny Lee
 
PPTX
SQL Server Reporting Services: IT Best Practices
Denny Lee
 
PPTX
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Denny Lee
 
PPTX
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Denny Lee
 
PPTX
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Denny Lee
 
PPTX
SQL Server Reporting Services Disaster Recovery webinar
Denny Lee
 
PPT
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Denny Lee
 
PPT
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Denny Lee
 
PPTX
SQLCAT - Data and Admin Security
Denny Lee
 
PPTX
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
Denny Lee
 
PPTX
SQLCAT: A Preview to PowerPivot Server Best Practices
Denny Lee
 
PPTX
Deploying and Managing PowerPivot for SharePoint
Denny Lee
 
PPTX
SQLCAT: Tier-1 BI in the World of Big Data
Denny Lee
 
PPTX
Big Data, Bigger Brains
Denny Lee
 
PDF
Jump Start into Apache Spark (Seattle Spark Meetup)
Denny Lee
 
PPTX
How Concur uses Big Data to get you to Tableau Conference On Time
Denny Lee
 
PPTX
SQL Server Reporting Services Disaster Recovery Webinar
Denny Lee
 
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Denny Lee
 
Spark to DocumentDB connector
Denny Lee
 
Introduction to Azure DocumentDB
Denny Lee
 
SQL Server Integration Services Best Practices
Denny Lee
 
SQL Server Reporting Services: IT Best Practices
Denny Lee
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Denny Lee
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Denny Lee
 
Yahoo!, Big Data, and Microsoft BI: Bigger and Better Together
Denny Lee
 
SQL Server Reporting Services Disaster Recovery webinar
Denny Lee
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Denny Lee
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Denny Lee
 
SQLCAT - Data and Admin Security
Denny Lee
 
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
Denny Lee
 
SQLCAT: A Preview to PowerPivot Server Best Practices
Denny Lee
 
Deploying and Managing PowerPivot for SharePoint
Denny Lee
 
SQLCAT: Tier-1 BI in the World of Big Data
Denny Lee
 
Big Data, Bigger Brains
Denny Lee
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Denny Lee
 
How Concur uses Big Data to get you to Tableau Conference On Time
Denny Lee
 
SQL Server Reporting Services Disaster Recovery Webinar
Denny Lee
 

Recently uploaded (20)

PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
July Patch Tuesday
Ivanti
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Complete Network Protection with Real-Time Security
L4RGINDIA
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
July Patch Tuesday
Ivanti
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Complete Network Protection with Real-Time Security
L4RGINDIA
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 

How Klout is changing the landscape of social media with Hadoop and BI

  • 1. How Klout is changing the landscape of social media with Hadoop and BI Dave Mariani VP Engineering, Klout Denny Lee Principal Program Manager Microsoft
  • 2. Discover and be recognized for how you influence the world
  • 3. Klout’s Big Data makes all this possible 15 Social Networks Processed Every Day 120 Terabytes of Data Storage 200,000 Indexed Users Added Every Day 140,000,000 Users Indexed Every Day 1,000,000,000 Social Signals Processed Every Day 30,000,000,000 API Calls Delivered Every Month 54,000,000,000 Rows of Data In Klout Data Warehouse 3
  • 4. KLOUT DATA ARCHITECTURE THE BEST TOOL FOR THE JOB Registrations DB Klout.com (MySql) (Node.js) Mobile Profile DB (ObjectiveC) Klout API (Scala) (HBase) Signal Collectors Data Partner API (Java/Scala) Enhancement Engine (Mashery) Data Warehouse (PIG/Hive) Search Index (Hive) (Elastic Search) Streams (MongoDB) Monitoring (Nagios) Serving Stores Dashboards (Tableau) Perks Analyics (Scala) Analytics Cubes Event Tracker (SSAS) (Scala)
  • 5. What is Business Intelligence? • Data Warehousing, OLAP, Dashboards, Reporting • Ability to slice and dice data in an ad-hoc manner • Getting the right data to the right people, at the right time • i.e. Now 5
  • 6. Why Hadoop + BI? Hadoop BI Requirement & Query Hive Engines Capture & store all data Yes No Support queries against detail data Yes No Support interactive queries & No Yes applications Support BI & visualization tools No Yes 6
  • 7. An Example: Klout Event Tracker 1 Perform A|B Testing of User Flows 2 Optimize Registration Funnels 3 Monitor consumer engagement & retention (DAUs & MAUs) 4 Flexibly track and report on user generated events 7
  • 8. A Flexible, Hierarchical Schema Project: Event: Property Type: Property Value: Collection Captured Attribute Attribute of Events User Action Key Value HomePage, Source, Google Search Actions, Gender, Male Mobile iOS Location SF +K (Add a topic) event
  • 9. Event Tracker Architecture event_log tstamp string { project string "project":"plusK", string event session_id bigint "event":"spend", insights3:9003/track/{"project":”plu ks_uid bigint sK","event":”spend”,"session_id":"0", Warehouse ip string "ip":"50.68.47.158", "ks_uid":123456,”type":”add_topic"} json_keys array<string> "kloutId":“123456", json_values “cookie_id":”123456", array<string> "ref":"http://klout.com/", json_text string "type":"add_topic", Tracker API Log Process Cube dt string Klout UI "time":"1338366015" Scala, Flume Analysis Scala, } hr string node.JS Services AJAX UX SELECT { [Measures].[Counter], [Measures].[PreviousPeriodCounter]} ON COLUMNS, will be saved in HDFS at: NON EMPTY CROSSJOIN ( /logs/events_tracking/2012-05-30/0100 exists([Date].[Date].[Date].allmembers, [Date].[Date].&[2012-05-19T00:00:00]:[Date].[Date].&[2012-06- 02T00:00:00]), [Events].[Event].[Event].allmembers ) DIMENSION PROPERTIES MEMBER_CAPTION ON ROWS FROM [ProductInsight] WHERE ({[Projects].[Project].[plusK]}) Instrument Collect Persist Query Report 9
  • 10. Hadoop & BI Together: Query Cube using a Custom App 10
  • 11. A peek into product insight > A|B test : unsorted vs. Sorted 11
  • 12. A Peek into Product Insights > Projects: Mobile iOS 12
  • 13. 13
  • 14. Hadoop & BI Together: Query Cube Using Viz App 14
  • 15. 15
  • 16. 16
  • 17. Hadoop & BI Together: Query Hive using CLI 17
  • 18. HiveQL Example SELECT get_json_object(json_text,'$.sid') as sid, get_json_object(json_text,'$.inc') as inc, get_json_object(json_text,'$.status') as status, event FROM bi.event_log WHERE project='mobile-ios' AND dt=20120612 AND get_json_object(json_text,'$.v')<>'1.5' AND (event = 'api_error' OR event = 'api_timeout') ORDER BY sid;
  • 19. 19
  • 20. Hadoop & BI Together: Query Hive using Excel 20
  • 21. 21
  • 22. Why Hadoop + BI? Hadoop BI Requirement & Query Hive Engines Capture & store all data Yes No Support queries against detail data Yes No Support interactive queries & No Yes applications Support BI & visualization tools No Yes 22

Editor's Notes

  • #19: Copy this from notepad for demo:CREATE TABLE mobile_ios_details_20120612 asSELECT get_json_object(json_text,&apos;$.sid&apos;) as sid, get_json_object(json_text,&apos;$.inc&apos;) as inc, get_json_object(json_text,&apos;$.status&apos;) as status, eventFROM bi.event_logWHERE project=&apos;mobile-ios&apos; AND dt=20120612 AND get_json_object(json_text,&apos;$.v&apos;)&lt;&gt;&apos;1.5&apos; AND (event = &apos;api_error&apos; OR event = &apos;api_timeout&apos;) ORDER BY sid;
  • #23: 1.Don’t throw data away, leverage Hadoop (track users and events for a/b testing)2. BI tools aggregate data, but we need to reach back to the detail to answer deeper questions (http codes)3. Hadoop != interactive queries (combined proprietary data with detail)4.Use open source, but don’t reinvent the wheel (BI tools are mature, valuable &amp; complementary)Leverage the best tool for the function or job