SlideShare a Scribd company logo
Dan Morris & Amelia Chu
Viacom
Scaling Self Service Analytics
with Databricks and Apache Spark
====
WANTED
Cash Reward if Found
Recent Sightings
9:00 AM As a Data Scientist
11:00 AM As a Data Engineer
1:00 PM As a Business Analyst
3:00 PM As a DevOps Engineer
Scaling Self Service Analytics with Databricks and Apache Spark with Amelia Chu and Dan Morris
Scaling Self Service Analytics with Databricks and Apache Spark with Amelia Chu and Dan Morris
DATA ENGINEER
DATA ENGINEER
DATA ENGINEER
…
DATA ENGINEER
DATA ENGINEER
DATA SCIENTISTNON-TECHNICAL
SINGLE PLATFORM
+
SELF-SERVICE
Scaling Self Service Analytics with Databricks and Apache Spark with Amelia Chu and Dan Morris
PARAMETERIZED
DASHBOARDS
UDF
LIBRARY
def ttff(specs): …
def vpi(specs): …
def rebuffer(specs): …
def rendition(specs): …
def ad_play(specs): …
+
Scaling Self Service Analytics with Databricks and Apache Spark with Amelia Chu and Dan Morris
PARAMETERIZED
DASHBOARDS DATA SCIENTISTNON-TECHNICAL
1
DATA SCIENTIST
UDF
LIBRARY DATA ENGINEER
def ttff(specs): …
def vpi(specs): …
def rebuffer(specs): …
def rendition(specs): …
def ad_play(specs): …
====
FOUND
A SINGLE PLATFORM
+
Thank You.
Questions? Feedback? Drop us a line!
dan.morris@viacom.com // @danmorris427
amelia.chu@viacom.com // @amelianchu

More Related Content

PDF
Operational Tips for Deploying Spark
Databricks
 
PDF
Lessons from Running Large Scale Spark Workloads
Databricks
 
PDF
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
Miklos Christine
 
PDF
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit
 
PDF
Spark Under the Hood - Meetup @ Data Science London
Databricks
 
PPTX
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
PDF
Project Tungsten: Bringing Spark Closer to Bare Metal
Databricks
 
PDF
Scaling Data Analytics Workloads on Databricks
Databricks
 
Operational Tips for Deploying Spark
Databricks
 
Lessons from Running Large Scale Spark Workloads
Databricks
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
Miklos Christine
 
Spark Summit EU talk by Miklos Christine paddling up the stream
Spark Summit
 
Spark Under the Hood - Meetup @ Data Science London
Databricks
 
ETL with SPARK - First Spark London meetup
Rafal Kwasny
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Databricks
 
Scaling Data Analytics Workloads on Databricks
Databricks
 

What's hot (20)

PDF
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Databricks
 
PDF
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
PDF
New Developments in Spark
Databricks
 
PDF
Using Apache Spark as ETL engine. Pros and Cons
Provectus
 
PDF
Apache Spark Core—Deep Dive—Proper Optimization
Databricks
 
PPTX
A Developer’s View into Spark's Memory Model with Wenchen Fan
Databricks
 
PDF
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Don Drake
 
PDF
Cost-Based Optimizer in Apache Spark 2.2
Databricks
 
PDF
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
PDF
Recent Developments In SparkR For Advanced Analytics
Databricks
 
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
PDF
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
PPTX
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
 
PDF
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Databricks
 
PDF
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
PDF
How To Connect Spark To Your Own Datasource
MongoDB
 
PDF
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Summit
 
PDF
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Databricks
 
PPTX
Use r tutorial part1, introduction to sparkr
Databricks
 
PDF
Improving Apache Spark Downscaling
Databricks
 
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Databricks
 
Spark Summit EU 2015: Lessons from 300+ production users
Databricks
 
New Developments in Spark
Databricks
 
Using Apache Spark as ETL engine. Pros and Cons
Provectus
 
Apache Spark Core—Deep Dive—Proper Optimization
Databricks
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
Databricks
 
Spark ETL Techniques - Creating An Optimal Fantasy Baseball Roster
Don Drake
 
Cost-Based Optimizer in Apache Spark 2.2
Databricks
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
Recent Developments In SparkR For Advanced Analytics
Databricks
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
Keeping Spark on Track: Productionizing Spark for ETL
Databricks
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Databricks
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
How To Connect Spark To Your Own Datasource
MongoDB
 
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Summit
 
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Databricks
 
Use r tutorial part1, introduction to sparkr
Databricks
 
Improving Apache Spark Downscaling
Databricks
 
Ad

Similar to Scaling Self Service Analytics with Databricks and Apache Spark with Amelia Chu and Dan Morris (20)

PDF
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Spark Summit
 
PDF
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Databricks
 
PDF
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Spark Summit
 
PPTX
Self-Service Analytics on Hadoop: Lessons Learned
DataWorks Summit/Hadoop Summit
 
PDF
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
PDF
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
 
PDF
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
PDF
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
PDF
Healthcare Claim Reimbursement using Apache Spark
Databricks
 
PDF
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Lillian Pierson
 
PDF
Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...
Databricks
 
PDF
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
Databricks
 
PDF
An Insider’s Guide to Maximizing Spark SQL Performance
Takuya UESHIN
 
PPTX
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
PDF
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
Databricks
 
PDF
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Databricks
 
PPTX
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Chester Chen
 
PDF
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Spark Summit
 
PDF
Performance Analysis of Apache Spark and Presto in Cloud Environments
Databricks
 
PDF
Windows of Opportunity: Big Data on Tap
Inside Analysis
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Spark Summit
 
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Databricks
 
Building a Just in Time Data Warehouse by Dan Morris and Jason Pohl
Spark Summit
 
Self-Service Analytics on Hadoop: Lessons Learned
DataWorks Summit/Hadoop Summit
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Sonya Liberman
 
Healthcare Claim Reimbursement using Apache Spark
Databricks
 
Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...
Lillian Pierson
 
Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...
Databricks
 
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
Databricks
 
An Insider’s Guide to Maximizing Spark SQL Performance
Takuya UESHIN
 
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-So...
Databricks
 
Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
Databricks
 
Sf big analytics_2018_04_18: Evolution of the GoPro's data platform
Chester Chen
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Spark Summit
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Databricks
 
Windows of Opportunity: Big Data on Tap
Inside Analysis
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

Recently uploaded (20)

PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PDF
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PDF
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
PDF
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
PDF
Chad Readey - An Independent Thinker
Chad Readey
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPT
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
Company Presentation pada Perusahaan ADB.pdf
didikfahmi
 
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
Chad Readey - An Independent Thinker
Chad Readey
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 

Scaling Self Service Analytics with Databricks and Apache Spark with Amelia Chu and Dan Morris