SlideShare a Scribd company logo
Vegas
The Missing Matplotlib for
Scala/Spark
DB Tsai
Roger Menezes
Homepage Kids Page Downloads Page
Netflix Recommendations
Every aspect
of the
Experience is
Machine
Learned
3
2017
> 100M members
> 190 countries
Multiple Devices
Genres: 23 rows/page average
Sims: 10 rows/page average
My List:
Continue Watching:
Popular on Netflix:
Trending Now:
Watch It Again:
Top Picks:
Because You Watched:
Genres:
New Releases:
Recently Added:
Originals RowBillboard:
Machine Learning at Netflix
● Optimize the Experimentation usecase vs Productionization
● Experimentation
○ Opportunity sizing, Data Exploration
○ Feature Identification and Selection
○ Tweaks to ML algos
○ Model Evaluation
Experimenter’s loop
Problem
Explore
Data
Identify
Features
Produce
Model
Evaluate
Model
Share
Findings
Notebooks
● Optimal for Experimentation
● Sharing reproducible research
○ Facilitates feedback loop with Product Managers
● End to end ML experiment.
○ Interactivity drives productivity
Python Notebooks
Python Notebooks
● Seamless Experience - ML experimentation
● Well known Scientific computing libraries
● Huge catalog of Visualization plotting libraries
○ Matplotlib, Seaborn, Bokeh, BQPlot, Lightning, etc.
Scala Notebooks
● Zeppelin, Jupyter, Databricks, Spark-Notebooks, ...
● Computing library gap filling up
● Lack of Visualization Libraries
○ Main friction point in adoption
○ End to End ML use case not convincing
Introducing Vegas
● Visualization Library in Scala
● Mainly built for the notebook use case
● Scala wrapper around Vega-Lite
○ Missing MatPlotLib for the Scala/Spark world.
DECLARATIVE
STATISTICAL
VISUALIZATION
GRAMMAR
IN SCALA
You tell it WHAT should be done with the data, and it knows
HOW to do it!
Operations such as filtering, aggregation, faceting are built
into the visualization, rather than putting the burden on the
user to massage the data into shape.
Complex visualizations can be built with a few high level
abstractions:
DATA
TRANS-
FORMS
SCALES
GUIDES MARKS
cf : Altair Talk by Brian Granger in PyData 2016 https://youtu.be/v5mrwq7yJc4
Added Bonus of Declarative
Visualizations:
INTERACTIVITY!
D3JS
VEGAS
VEGAS CODE EXPANDS OUT TO D3JS CODE!
Anatomy of a plot: Channels
X/Y channel
Shape Channel
Size Channel
Color Channel
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger Menezes
Features…
1. Supports most plot types
2. Trellis plots
3. Layers
Layer 1.
Layer 2.
Layer 3.
4. Notebook and Consoles
5. Built-in spark support
Vegas
.withDataFrame(myDataFrame)
.encodeX(“population”)
.encodeY(“age”)
Mapped Columns
Pass In DF.
6. Visual statistics
● Advanced Binning
● Sorting
● Scaling
● Custom Transforms
● Time Series
● Aggregation
● Filtering
● Math functions (log, etc)
● Descriptive Statistics
How It Works !
1. Specify in Scala
2. Embed HTML
(iFrame)
3. Render within
iFrame using JS
VEGA
D3JS
VEGA-LITE*
VEGAS
MOREABSTRACTION SCALA DSL EMITS TYPE-CHECKED
VEGA-LITE JSON
VEGA-LITE CONVERTS INTERNALLY
TO VEGA JSON SPEC
VEGA TRANSLATES JSON TO D3JS
CODE THAT CAN BE VERY VERBOSE
A SCALA DSL FOR VEGA-LITE
* Vega-Lite
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger Menezes
What’s coming
1. Interactive selections
2. Selections transforms
Contributors
Aish DB Roger
Sudeep Jeremy
Thank you.
@NetflixResearch
@rogermenezes @dbtsai
The missing MatPlotLib
for Scala/Spark
http://vegas-viz.org

More Related Content

What's hot (20)

PPT
Library of Congress Classification
Denise Garofalo
 
PPTX
Cartographic Resources Cataloging for Beginners Slides
ALATechSource
 
PPTX
Types of translation services and their role
lee shin
 
PPT
Classification a review
Ime Amor Mortel
 
PPTX
Sarah sean-tracie-global culture and cultural flows
Sarah_Arnold
 
PPT
Million Book Project
Nasser Saleh PhD P.Eng
 
PPTX
Controlled Vocabulary
guest118a9a
 
PPTX
A World of Ideas Cultures of Globalization
Monte Christo
 
PDF
Functional Requirements For Bibliographic Records - FRBR
Islamic University of Lebanon
 
PPT
Catalogue Entry Format
Sarika Sawant
 
PPTX
Organization of information
Carl Hess
 
DOC
Globalisation, its challenges and advantages
fathima habeeb
 
PPTX
Intro to rda
Anna Enos
 
PPT
Universal Bibliographic Control and Universal Availability of Publications (U...
Dr. Anjaiah Mothukuri
 
PPTX
alerting services.pptx
Rbalasubramani
 
PPT
Marc 21
Soumen Mondal
 
PPTX
POPSI
silambu111
 
PDF
Functional requirements for bibliographic records & functional requirements f...
UDAYA VARADARAJAN
 
PPS
The Changing Library Environment of Technical Services
Fe Angela Verzosa
 
Library of Congress Classification
Denise Garofalo
 
Cartographic Resources Cataloging for Beginners Slides
ALATechSource
 
Types of translation services and their role
lee shin
 
Classification a review
Ime Amor Mortel
 
Sarah sean-tracie-global culture and cultural flows
Sarah_Arnold
 
Million Book Project
Nasser Saleh PhD P.Eng
 
Controlled Vocabulary
guest118a9a
 
A World of Ideas Cultures of Globalization
Monte Christo
 
Functional Requirements For Bibliographic Records - FRBR
Islamic University of Lebanon
 
Catalogue Entry Format
Sarika Sawant
 
Organization of information
Carl Hess
 
Globalisation, its challenges and advantages
fathima habeeb
 
Intro to rda
Anna Enos
 
Universal Bibliographic Control and Universal Availability of Publications (U...
Dr. Anjaiah Mothukuri
 
alerting services.pptx
Rbalasubramani
 
Marc 21
Soumen Mondal
 
POPSI
silambu111
 
Functional requirements for bibliographic records & functional requirements f...
UDAYA VARADARAJAN
 
The Changing Library Environment of Technical Services
Fe Angela Verzosa
 

Similar to VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger Menezes (20)

PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
Spark Summit
 
PDF
Spark Summit EU talk by Sudeep Das and Aish Faenton
Spark Summit
 
PDF
Data Processing and Visualisation Frameworks - Lecture 6 - Information Visual...
Beat Signer
 
PDF
GeeCON Prague 2015
Mateusz Dymczyk
 
PDF
Euroscipy2018
Patrick Muehlbauer
 
PDF
Porting R Models into Scala Spark
carl_pulley
 
PPTX
Future of ai on the jvm
Adam Gibson
 
PDF
Joker'14 Java as a fundamental working tool of the Data Scientist
Alexey Zinoviev
 
PDF
Running deep neural nets in your Java application with Deeplearning4j
Alexander Fedintsev
 
PDF
Machine Learning by Example - Apache Spark
Meeraj Kunnumpurath
 
PDF
Navigating the Wide World of Data Visualization Libraries
Krist Wongsuphasawat
 
PDF
Data Visualization in Machine Learning | IABAC
IABAC
 
PDF
Spark meetup TCHUG
Ryan Bosshart
 
PDF
Spark forplainoldjavageeks svforum_20140724
sdeeg
 
PDF
From Experimentation to Production - Scala & Python APIs for DL4J
Max Pumperla
 
PPTX
Introduction overviewmachinelearning sig Door Lucas Jellema
Getting value from IoT, Integration and Data Analytics
 
PDF
IRJET- Plug-In based System for Data Visualization
IRJET Journal
 
PDF
Applying Machine Learning to Data Visaulization: What, Why, Where, and How
Qianwen Wang
 
PPTX
Introduction to Machine Learning - An overview and first step for candidate d...
Lucas Jellema
 
PDF
Visualizing big data in the browser using spark
Databricks
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with Roger Menezes and D...
Spark Summit
 
Spark Summit EU talk by Sudeep Das and Aish Faenton
Spark Summit
 
Data Processing and Visualisation Frameworks - Lecture 6 - Information Visual...
Beat Signer
 
GeeCON Prague 2015
Mateusz Dymczyk
 
Euroscipy2018
Patrick Muehlbauer
 
Porting R Models into Scala Spark
carl_pulley
 
Future of ai on the jvm
Adam Gibson
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Alexey Zinoviev
 
Running deep neural nets in your Java application with Deeplearning4j
Alexander Fedintsev
 
Machine Learning by Example - Apache Spark
Meeraj Kunnumpurath
 
Navigating the Wide World of Data Visualization Libraries
Krist Wongsuphasawat
 
Data Visualization in Machine Learning | IABAC
IABAC
 
Spark meetup TCHUG
Ryan Bosshart
 
Spark forplainoldjavageeks svforum_20140724
sdeeg
 
From Experimentation to Production - Scala & Python APIs for DL4J
Max Pumperla
 
Introduction overviewmachinelearning sig Door Lucas Jellema
Getting value from IoT, Integration and Data Analytics
 
IRJET- Plug-In based System for Data Visualization
IRJET Journal
 
Applying Machine Learning to Data Visaulization: What, Why, Where, and How
Qianwen Wang
 
Introduction to Machine Learning - An overview and first step for candidate d...
Lucas Jellema
 
Visualizing big data in the browser using spark
Databricks
 
Ad

More from Spark Summit (20)

PDF
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
PDF
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
Ad

Recently uploaded (20)

PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger Menezes