SlideShare a Scribd company logo
Building a Just-In-Time Data
Warehouse
Dan Morris (Viacom)
JasonPohl (Databricks)
February 18, 2016
About Viacom
• LeadingGlobalEntertainmentContentCompany
• 23 Brandsin 170+ Countries
2
Introductions
Dan Morris
• SeniorDirectorof ProductAnalytics
• 12 Years @Viacomin a varietyof roles
• Intersectionof ProductandData
3
About My Team
• ProductAnalytics team formedone year ago
• Ourmissionis to grow our globalaudiencewiththe
highestquality userspossible
4
Key Areas of Focus
• Mobilizeeffortsusing growthtargets
• Uncoverdeepinsightsusing churnand cohort
analysis
• Treat all ideas as hypothesesand test them
rigorously
5
Where Are We Today: App Platform
6
Make it extremelysimpleto build and deploy engaging
apps around the globe
FEATURES
UI	&	ANIMATIONS
CONFIGURATION	SETTINGS
APP	BINARY
Disciplined Product Dev Approach is Key
7
• 23 brandsin 170+ countries
• Lots of marketdynamics
• Many stakeholders
... Data is a must!
Sound Data Management is Required
8
9
11
13
14
16
18
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Expected Data Volume Growth (TB)
2016
Our Data Infrastructure
9
S3
Spark +
Databricks
Redshift Tableau
Introducing the TV Land iOS App
10
Applying Product Analytics to TV Land
11
• Growth Targets
• Dashboards
• DeepDive Analyses
• A/B Testing
Baselines Used to Set Growth Targets
12
Business ModelingETL
Data Volume
• 30 sites/apps
• 11 TB
Data Volume
• 30 sites/apps
• 1 TB
S3
Spark +
Databricks
Redshift Tableau
Growth Targets are Monitored via Dashboards
13
1/4/16 1/11/16 1/18/16 1/25/16
New Users
Returning Users
Weekly	Retention	by	Cohort
0 1 2 3 4
1/4/16 100% 53% 41% 33% 30%
1/11/16 100% 58% 51% 42%
1/18/16 100% 49% 38%
1/25/16 100% 49%
Audience	Growth	by	Cohort
Dashboards Spark Deep Dive Analyses
14
0%
25%
50%
75%
100%
0 1 2 3 4 5 6 7 8 9 10 11 12
Users will quickly churn if
not activated within small
window of time.
%ofusersretained
Deep Dive Analysis Requires Flexibility
15
Not NeededDeep Dive Analysis
S3
Spark +
Databricks
• Define schema on read instead of write
• Work through data quality issues just-in-time.
• Tease out business question iterative and interactively.
• Use programming language of your choice.
Redshift Tableau
Hypotheses Require A/B Testing
16
Statistical Analysis
Data Sets
• Adobe Logs
• Experiment Logs
S3
Spark +
Databricks
Tableau
Not Needed
Redshift Tableau
Summary of Our Setup
17
Just in Time Traditional
Primary Audience • Product Analysts • Product Team
• Business
Stakeholders
Tasks • Exploratory
Analysis
• A/B Testing
• Ad Hoc Queries
• Dashboards
Tools • S3
• Spark
• Databricks
• Redshift
• Tableau
Coming Soon…
18
• Go live with internalA/B testing platform
• Continueto evolve our setup
• Further scale model to support Product AnalyticsPan-
Viacom
Questions ?
19
Thank you.
Otherpartingwordsorcontactinformationgohere.

More Related Content

PPTX
Spark Summit Keynote by Shaun Connolly
Spark Summit
 
PDF
Spark and the Enterprise by Tony Baer
Spark Summit
 
PPTX
Spark Summit East Keynote by Anjul Bhambhri
Jen Aman
 
PDF
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
Spark Summit
 
PPTX
Spark Summit Keynote by Seshu Adunuthula
Spark Summit
 
PDF
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Spark Summit
 
PPTX
Spark Summit Keynote by Suren Nathan
Spark Summit
 
PDF
Spark Usage in Enterprise Business Operations
SAP Technology
 
Spark Summit Keynote by Shaun Connolly
Spark Summit
 
Spark and the Enterprise by Tony Baer
Spark Summit
 
Spark Summit East Keynote by Anjul Bhambhri
Jen Aman
 
5 Reasons Enterprise Adoption of Spark is Unstoppable by Mike Gualtieri
Spark Summit
 
Spark Summit Keynote by Seshu Adunuthula
Spark Summit
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Spark Summit
 
Spark Summit Keynote by Suren Nathan
Spark Summit
 
Spark Usage in Enterprise Business Operations
SAP Technology
 

What's hot (19)

PDF
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
Databricks
 
PPTX
Spark Summit presentation by Ken Tsai
Spark Summit
 
PDF
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Spark Summit
 
PPTX
In-Memory Computing Webcast. Market Predictions 2017
SingleStore
 
PDF
Analysing data analytics use cases to understand big data platform
dataeaze systems
 
PDF
TopNotch: Systematically Quality Controlling Big Data by David Durst
Spark Summit
 
PDF
Scaling Production Machine Learning Pipelines with Databricks
Databricks
 
PDF
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Databricks
 
PDF
Life is but a Stream
Databricks
 
PPTX
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
CTO View: Driving the On-Demand Economy with Predictive Analytics
SingleStore
 
PDF
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
Databricks
 
PDF
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Spark Summit
 
PPTX
Cloud-Con: Integration & Web APIs
SnapLogic
 
PDF
Empowering Real Time Patient Care Through Spark Streaming
Databricks
 
PPTX
Snaplogic Live: Big Data in Motion
SnapLogic
 
PDF
Building the Ideal Stack for Machine Learning
SingleStore
 
PDF
Operationalizing Machine Learning at Scale at Starbucks
Databricks
 
2016 Spark Summit East Keynote: Ali Ghodsi and Databricks Community Edition demo
Databricks
 
Spark Summit presentation by Ken Tsai
Spark Summit
 
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Spark Summit
 
In-Memory Computing Webcast. Market Predictions 2017
SingleStore
 
Analysing data analytics use cases to understand big data platform
dataeaze systems
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
Spark Summit
 
Scaling Production Machine Learning Pipelines with Databricks
Databricks
 
Data Science and Enterprise Engineering with Michael Finger and Chris Robison
Databricks
 
Life is but a Stream
Databricks
 
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
CTO View: Driving the On-Demand Economy with Predictive Analytics
SingleStore
 
Unlocking Geospatial Analytics Use Cases with CARTO and Databricks
Databricks
 
Regulatory Reporting of Asset Trading Using Apache Spark-(Sudipto Shankar Das...
Spark Summit
 
Cloud-Con: Integration & Web APIs
SnapLogic
 
Empowering Real Time Patient Care Through Spark Streaming
Databricks
 
Snaplogic Live: Big Data in Motion
SnapLogic
 
Building the Ideal Stack for Machine Learning
SingleStore
 
Operationalizing Machine Learning at Scale at Starbucks
Databricks
 
Ad

Similar to Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl (20)

PDF
How Celtra Optimizes its Advertising Platform with Databricks
Grega Kespret
 
PPTX
Dashlane Mission Teams
Dashlane
 
PPTX
WinOps - Lessons learned from Enterprise DevOps with Microsoft technologies ...
DevOpsGroup
 
PDF
Gain a Holistic View of your Customer's Journey
Platfora
 
PPTX
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
PDF
The Impact of SMACT on the Data Management Stack
SnapLogic
 
PPTX
Turning Analysis into Action with APIs - Superweek2017
Mark Edmondson
 
PPTX
Turning Analysis into Action with APIs - Superweek 2017
Peter Meyer
 
PDF
Anatomy of an Intranet (Triangle SharePoint User Group) January 2016
Michael Greene
 
PDF
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
PDF
Pubcon 2015 - Content Management Across International Markets
Doug Platts
 
PDF
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
Revolution Analytics
 
PDF
Big Data at a Gaming Company: Spil Games
Rob Winters
 
PDF
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Elemica
 
PDF
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
Hortonworks
 
PPTX
Best Practices for Managing SaaS Applications
Correlsense
 
PPTX
Maintainable Machine Learning Products
Andrew Musselman
 
PPTX
Data Visualization Trends - Next Steps for Tableau
Arunima Gupta
 
PPTX
How to add security in dataops and devops
Ulf Mattsson
 
PDF
Moving Beyond Batch: Transactional Databases for Real-time Data
VoltDB
 
How Celtra Optimizes its Advertising Platform with Databricks
Grega Kespret
 
Dashlane Mission Teams
Dashlane
 
WinOps - Lessons learned from Enterprise DevOps with Microsoft technologies ...
DevOpsGroup
 
Gain a Holistic View of your Customer's Journey
Platfora
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
Infochimps, a CSC Big Data Business
 
The Impact of SMACT on the Data Management Stack
SnapLogic
 
Turning Analysis into Action with APIs - Superweek2017
Mark Edmondson
 
Turning Analysis into Action with APIs - Superweek 2017
Peter Meyer
 
Anatomy of an Intranet (Triangle SharePoint User Group) January 2016
Michael Greene
 
Power to the People: A Stack to Empower Every User to Make Data-Driven Decisions
Looker
 
Pubcon 2015 - Content Management Across International Markets
Doug Platts
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
Revolution Analytics
 
Big Data at a Gaming Company: Spil Games
Rob Winters
 
Sergio Juarez, Elemica – “From Big Data to Value: The Power of Master Data Ma...
Elemica
 
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
Hortonworks
 
Best Practices for Managing SaaS Applications
Correlsense
 
Maintainable Machine Learning Products
Andrew Musselman
 
Data Visualization Trends - Next Steps for Tableau
Arunima Gupta
 
How to add security in dataops and devops
Ulf Mattsson
 
Moving Beyond Batch: Transactional Databases for Real-time Data
VoltDB
 
Ad

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Recently uploaded (20)

PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PPTX
Presentation on animal welfare a good topic
kidscream385
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
Presentation on animal welfare a good topic
kidscream385
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 

Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl

  • 1. Building a Just-In-Time Data Warehouse Dan Morris (Viacom) JasonPohl (Databricks) February 18, 2016
  • 3. Introductions Dan Morris • SeniorDirectorof ProductAnalytics • 12 Years @Viacomin a varietyof roles • Intersectionof ProductandData 3
  • 4. About My Team • ProductAnalytics team formedone year ago • Ourmissionis to grow our globalaudiencewiththe highestquality userspossible 4
  • 5. Key Areas of Focus • Mobilizeeffortsusing growthtargets • Uncoverdeepinsightsusing churnand cohort analysis • Treat all ideas as hypothesesand test them rigorously 5
  • 6. Where Are We Today: App Platform 6 Make it extremelysimpleto build and deploy engaging apps around the globe FEATURES UI & ANIMATIONS CONFIGURATION SETTINGS APP BINARY
  • 7. Disciplined Product Dev Approach is Key 7 • 23 brandsin 170+ countries • Lots of marketdynamics • Many stakeholders ... Data is a must!
  • 8. Sound Data Management is Required 8 9 11 13 14 16 18 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Expected Data Volume Growth (TB) 2016
  • 9. Our Data Infrastructure 9 S3 Spark + Databricks Redshift Tableau
  • 10. Introducing the TV Land iOS App 10
  • 11. Applying Product Analytics to TV Land 11 • Growth Targets • Dashboards • DeepDive Analyses • A/B Testing
  • 12. Baselines Used to Set Growth Targets 12 Business ModelingETL Data Volume • 30 sites/apps • 11 TB Data Volume • 30 sites/apps • 1 TB S3 Spark + Databricks Redshift Tableau
  • 13. Growth Targets are Monitored via Dashboards 13 1/4/16 1/11/16 1/18/16 1/25/16 New Users Returning Users Weekly Retention by Cohort 0 1 2 3 4 1/4/16 100% 53% 41% 33% 30% 1/11/16 100% 58% 51% 42% 1/18/16 100% 49% 38% 1/25/16 100% 49% Audience Growth by Cohort
  • 14. Dashboards Spark Deep Dive Analyses 14 0% 25% 50% 75% 100% 0 1 2 3 4 5 6 7 8 9 10 11 12 Users will quickly churn if not activated within small window of time. %ofusersretained
  • 15. Deep Dive Analysis Requires Flexibility 15 Not NeededDeep Dive Analysis S3 Spark + Databricks • Define schema on read instead of write • Work through data quality issues just-in-time. • Tease out business question iterative and interactively. • Use programming language of your choice. Redshift Tableau
  • 16. Hypotheses Require A/B Testing 16 Statistical Analysis Data Sets • Adobe Logs • Experiment Logs S3 Spark + Databricks Tableau Not Needed Redshift Tableau
  • 17. Summary of Our Setup 17 Just in Time Traditional Primary Audience • Product Analysts • Product Team • Business Stakeholders Tasks • Exploratory Analysis • A/B Testing • Ad Hoc Queries • Dashboards Tools • S3 • Spark • Databricks • Redshift • Tableau
  • 18. Coming Soon… 18 • Go live with internalA/B testing platform • Continueto evolve our setup • Further scale model to support Product AnalyticsPan- Viacom