SlideShare a Scribd company logo
Visualizing 

Autotrader Traffic
Using Spark Streaming
Jon Gregg, Cox Automotive
Overview
• Cox Automotive and Hadoop
• Spark Streaming application
• Spark roadmap at Cox Automotive
Why we’re using Hadoop
$45B+ vehicle values sold
annually through Manheim
AutoTrader.com has 18M
unique visitors each month
and lists an average of 4M
cars monthly
Kelley Blue Book provides
values for 290M cars annually
and has 18M+ unique visitors
monthly
vAuto has over 2.3M active vehicles
in inventory and started 1.65M
vehicle appraisals in Oct
265K vehicles sold per month (on
average) through a VinSolutions CRM
Cox Automotive
The Journey
Cox Automotive $45B+ vehicle values sold
annually through Manheim
AutoTrader.com has 18M
unique visitors each month
and lists an average of 4M
cars monthly
Kelley Blue Book provides
values for 290M cars annually
and has 18M+ unique visitors
monthly
vAuto has over 2.3M active vehicles
in inventory and started 1.65M
vehicle appraisals in Oct
265K vehicles sold per month (on
average) through a VinSolutions CRM
Cox Automotive
The Journey
• Over 25 companies

(and growing)



• Facilitate joining data, analyst collaboration
• Hadoop cluster, dedicated ingest team
Use Hadoop where it makes sense
• Joining data from across several companies
• Large amounts of data (Querying and Reporting)
• Build out business logic so it’s shareable
That’s all great…
… but we also have to showcase
what Hadoop can do
Autotrader’s “Autobowl”
Autobowl: Goal
Find which Big Game car commercial led to
the greatest Autotrader traffic increase, as a
proxy for influence on consumers?
Two solutions
• Hive on MapReduce

Mature, supported product

Shows SQL’s capabilities on Hadoop
• Spark

Started as a POC, no expectations

How would it work with YARN, Kerberos?
Autobowl Hourly Data
Make Model Hour VDPs Searches
…
Kia Sedona 9pm 300 290
Kia Sedona 10pm 310 320
Kia Sorento 3pm 220 240
Kia Sorento 4pm 210 220
Kia Sorento 5pm 350 380
…
+70%!
Comparison: Hive vs. Spark
Hive vs. Spark: Processing Time
Hive Spark
1.5 min
18 min
Minutes to Process 1hr of Site Activity Data
(And a month 

to spare!)
Spark Streaming for Near Realtime
Visualization of Traffic
Autobowl Hourly Data
Make Model Hour VDPs Searches
…
Kia Sedona 9pm 300 290
Kia Sedona 10pm 310 320
Kia Sorento 3pm 220 240
Kia Sorento 4pm 210 220
Kia Sorento 5pm 350 380
…
How about a visualization using Spark Streaming?
High-level architecture diagram
web server
web server
web server
web server
emitter kafka
Hadoop
(Spark)
AWS
Video
(Screenshot from Video)
What’s next?
• Detecting anomalies in Autotrader metrics after
a site update
Other Visualization use cases
• Detecting anomalies in Autotrader metrics after
a site update
• Executive dashboards
• Visualizations for A/B testing
Other Visualization use cases
Gaining BI Adoption
• Most BI users use Hive or point-and-click app
• But there’s been a shift - Spark is in use by
analyst teams within Autotrader, KBB using
Python
• Spark used by developers at Autotrader, KBB,
Mannheim, NextGear
Gaining BI Adoption
• Speed improvements with Dataframes, 

Kafka integration
• “Easier sell” than Java/Scala

Scripting, visualization+analytics packages
• Onboarding users

Central repository with best practices

Individual support for BI Spark champions

Guidelines on setting Spark parameters

Python: our primary Spark language

More Related Content

PDF
Cloud Connect 2012, Big Data @ Netflix
Jerome Boulon
 
PDF
Headaches and Breakthroughs in Building Continuous Applications
Databricks
 
PPTX
Big Data Pipeline and Analytics Platform
Sudhir Tonse
 
PDF
Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...
Spark Summit
 
PDF
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
confluent
 
PDF
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
PDF
Spark at Airbnb
Hao Wang
 
PDF
Spark Summit EU talk by Bas Geerdink
Spark Summit
 
Cloud Connect 2012, Big Data @ Netflix
Jerome Boulon
 
Headaches and Breakthroughs in Building Continuous Applications
Databricks
 
Big Data Pipeline and Analytics Platform
Sudhir Tonse
 
Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...
Spark Summit
 
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
confluent
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
Spark at Airbnb
Hao Wang
 
Spark Summit EU talk by Bas Geerdink
Spark Summit
 

What's hot (19)

PDF
How Disney+ uses fast data ubiquity to improve the customer experience
Martin Zapletal
 
PPTX
Disrupting Big Data with Apache Spark in the Cloud
Jen Aman
 
PPTX
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
HostedbyConfluent
 
PDF
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
HostedbyConfluent
 
PPTX
How Spark Enables the Internet of Things- Paula Ta-Shma
Spark Summit
 
PDF
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
HostedbyConfluent
 
PDF
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
HostedbyConfluent
 
PDF
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
HostedbyConfluent
 
PDF
Spark Summit EU talk by Zoltan Zvara
Spark Summit
 
PPTX
Netflix Big Data Paris 2017
Jason Flittner
 
PDF
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Databricks
 
PDF
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Databricks
 
PDF
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Lightbend
 
PDF
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
confluent
 
PDF
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
PDF
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
HostedbyConfluent
 
PDF
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
HostedbyConfluent
 
PDF
KSQL: Open Source Streaming for Apache Kafka
confluent
 
PDF
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
How Disney+ uses fast data ubiquity to improve the customer experience
Martin Zapletal
 
Disrupting Big Data with Apache Spark in the Cloud
Jen Aman
 
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
HostedbyConfluent
 
DataOps Automation for a Kafka Streaming Platform (Andrew Stevenson + Spiros ...
HostedbyConfluent
 
How Spark Enables the Internet of Things- Paula Ta-Shma
Spark Summit
 
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
HostedbyConfluent
 
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
HostedbyConfluent
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
HostedbyConfluent
 
Spark Summit EU talk by Zoltan Zvara
Spark Summit
 
Netflix Big Data Paris 2017
Jason Flittner
 
Using SparkML to Power a DSaaS (Data Science as a Service) with Kiran Muglurm...
Databricks
 
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Databricks
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Lightbend
 
Kafka, Killer of Point-to-Point Integrations, Lucian Lita
confluent
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
HostedbyConfluent
 
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
HostedbyConfluent
 
KSQL: Open Source Streaming for Apache Kafka
confluent
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Spark Summit
 
Ad

Viewers also liked (7)

PPTX
Data Science with Spark & Zeppelin
Vinay Shukla
 
PPTX
Introduction to Streaming Distributed Processing with Storm
Brandon O'Brien
 
PPTX
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Brandon O'Brien
 
PDF
Manual de programacion_con_robots_para_la_escuela
Angel De las Heras
 
PPTX
Real time data viz with Spark Streaming, Kafka and D3.js
Ben Laird
 
PDF
Big Data visualization with Apache Spark and Zeppelin
prajods
 
PDF
Sparkly Notebook: Interactive Analysis and Visualization with Spark
felixcss
 
Data Science with Spark & Zeppelin
Vinay Shukla
 
Introduction to Streaming Distributed Processing with Storm
Brandon O'Brien
 
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Brandon O'Brien
 
Manual de programacion_con_robots_para_la_escuela
Angel De las Heras
 
Real time data viz with Spark Streaming, Kafka and D3.js
Ben Laird
 
Big Data visualization with Apache Spark and Zeppelin
prajods
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
felixcss
 
Ad

Similar to Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gregg, AutoTrader) (20)

PPTX
Modernizing Cloud and Hyperconverged Infrastructure monitoring
ManageEngine, Zoho Corporation
 
PPTX
Maintaining the Front Door to Netflix : The Netflix API
Daniel Jacobson
 
PDF
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
PPTX
Near Realtime Analytics using Druid, Spark Streaming and Kinesis
Anil Gupta
 
PDF
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
PPTX
BikersPlanet.pptx
PareshsinhUmeshsinhC
 
PDF
SUGMEA - Sitecore JSS and Performance Optimization - Alex Shyba - Altudo
dharmeshharji
 
PPTX
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
 
PPTX
WordPress Café April: Viking motors case
Exove
 
PDF
Top 13 web scraping tools in 2022
Aparna Sharma
 
PDF
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
yalisassoon
 
PDF
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Spark Summit
 
PDF
How We Used Databricks, MLeap, and Kubernetes to Productionize Spark ML Faste...
Databricks
 
PPTX
SEO Project
machli
 
PPTX
Redis as a High Scale Swiss Army Knife by Rahul Dagar and Abhishek Gupta of G...
Redis Labs
 
PDF
Big problems Big data, simple AWS solution
Jean-Claude Sotto
 
PDF
Big problems Big Data, simple solutions
Claudio Pontili
 
PPTX
Kanban India 2024 | Sreejith NT and Harshith Bhaskar | AI in Flow improvement...
LeanKanbanIndia
 
PPTX
Keynote SUGCON 2021 - Sitecore and SaaS our shared journey
Pieter Brinkman
 
PPTX
Kanban: Performance and control using Varnish
Varnish Software
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
ManageEngine, Zoho Corporation
 
Maintaining the Front Door to Netflix : The Netflix API
Daniel Jacobson
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
Near Realtime Analytics using Druid, Spark Streaming and Kinesis
Anil Gupta
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
BikersPlanet.pptx
PareshsinhUmeshsinhC
 
SUGMEA - Sitecore JSS and Performance Optimization - Alex Shyba - Altudo
dharmeshharji
 
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google Cloud
ScyllaDB
 
WordPress Café April: Viking motors case
Exove
 
Top 13 web scraping tools in 2022
Aparna Sharma
 
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
yalisassoon
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Spark Summit
 
How We Used Databricks, MLeap, and Kubernetes to Productionize Spark ML Faste...
Databricks
 
SEO Project
machli
 
Redis as a High Scale Swiss Army Knife by Rahul Dagar and Abhishek Gupta of G...
Redis Labs
 
Big problems Big data, simple AWS solution
Jean-Claude Sotto
 
Big problems Big Data, simple solutions
Claudio Pontili
 
Kanban India 2024 | Sreejith NT and Harshith Bhaskar | AI in Flow improvement...
LeanKanbanIndia
 
Keynote SUGCON 2021 - Sitecore and SaaS our shared journey
Pieter Brinkman
 
Kanban: Performance and control using Varnish
Varnish Software
 

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

Recently uploaded (20)

PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PDF
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...
pedelli41
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
The_Future_of_Data_Analytics_by_CA_Suvidha_Chaplot_UPDATED.pdf
CA Suvidha Chaplot
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 

Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gregg, AutoTrader)

  • 1. Visualizing 
 Autotrader Traffic Using Spark Streaming Jon Gregg, Cox Automotive
  • 2. Overview • Cox Automotive and Hadoop • Spark Streaming application • Spark roadmap at Cox Automotive
  • 4. $45B+ vehicle values sold annually through Manheim AutoTrader.com has 18M unique visitors each month and lists an average of 4M cars monthly Kelley Blue Book provides values for 290M cars annually and has 18M+ unique visitors monthly vAuto has over 2.3M active vehicles in inventory and started 1.65M vehicle appraisals in Oct 265K vehicles sold per month (on average) through a VinSolutions CRM Cox Automotive The Journey
  • 5. Cox Automotive $45B+ vehicle values sold annually through Manheim AutoTrader.com has 18M unique visitors each month and lists an average of 4M cars monthly Kelley Blue Book provides values for 290M cars annually and has 18M+ unique visitors monthly vAuto has over 2.3M active vehicles in inventory and started 1.65M vehicle appraisals in Oct 265K vehicles sold per month (on average) through a VinSolutions CRM Cox Automotive The Journey • Over 25 companies
 (and growing)
 
 • Facilitate joining data, analyst collaboration • Hadoop cluster, dedicated ingest team
  • 6. Use Hadoop where it makes sense • Joining data from across several companies • Large amounts of data (Querying and Reporting) • Build out business logic so it’s shareable
  • 7. That’s all great… … but we also have to showcase what Hadoop can do
  • 9. Autobowl: Goal Find which Big Game car commercial led to the greatest Autotrader traffic increase, as a proxy for influence on consumers?
  • 10. Two solutions • Hive on MapReduce
 Mature, supported product
 Shows SQL’s capabilities on Hadoop • Spark
 Started as a POC, no expectations
 How would it work with YARN, Kerberos?
  • 11. Autobowl Hourly Data Make Model Hour VDPs Searches … Kia Sedona 9pm 300 290 Kia Sedona 10pm 310 320 Kia Sorento 3pm 220 240 Kia Sorento 4pm 210 220 Kia Sorento 5pm 350 380 … +70%!
  • 13. Hive vs. Spark: Processing Time Hive Spark 1.5 min 18 min Minutes to Process 1hr of Site Activity Data (And a month 
 to spare!)
  • 14. Spark Streaming for Near Realtime Visualization of Traffic
  • 15. Autobowl Hourly Data Make Model Hour VDPs Searches … Kia Sedona 9pm 300 290 Kia Sedona 10pm 310 320 Kia Sorento 3pm 220 240 Kia Sorento 4pm 210 220 Kia Sorento 5pm 350 380 … How about a visualization using Spark Streaming?
  • 16. High-level architecture diagram web server web server web server web server emitter kafka Hadoop (Spark) AWS
  • 17. Video
  • 20. • Detecting anomalies in Autotrader metrics after a site update Other Visualization use cases
  • 21. • Detecting anomalies in Autotrader metrics after a site update • Executive dashboards • Visualizations for A/B testing Other Visualization use cases
  • 23. • Most BI users use Hive or point-and-click app • But there’s been a shift - Spark is in use by analyst teams within Autotrader, KBB using Python • Spark used by developers at Autotrader, KBB, Mannheim, NextGear Gaining BI Adoption
  • 24. • Speed improvements with Dataframes, 
 Kafka integration • “Easier sell” than Java/Scala
 Scripting, visualization+analytics packages • Onboarding users
 Central repository with best practices
 Individual support for BI Spark champions
 Guidelines on setting Spark parameters
 Python: our primary Spark language