Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farchtchi

4 likes•1,994 views

This document discusses how to visualize streaming data using Spark. It describes how Spark Streaming can be used to process streaming data in real-time and integrate it with visualization tools. Key points include: - Spark Streaming receives streaming data from sources like Kafka and processes it using in-memory computations in a single JVM cluster. - The processed data can be stored in buffers like MongoDB or output to systems like MemSQL, Solr to enable interactive visualizations that update in real-time. - A demo is shown of Twitter data being streamed and analyzed using Spark Streaming with results stored in MemSQL and Solr for visualization. - Benefits of this approach include being able to work with streaming data

Data & Analytics

INTERACTIVE VISUALIZATION OF
STREAMING DATA POWERED BY
SPARK

Streaming @Zoomdata
Visualizations react to
new data delivered
Users start,
stop, pause
the stream
Users select a rolling
window or pin a start
time to capture
cumulative metrics

Drivers for Streaming Data
Data Freshness Time to Analytic Business Context

Challenges
• Time
• Frequency
• Retention
• Synchronization
• Order
• Updates

Addressing streaming @Zoomdata
Historical Revised
Receive Data JMS Kafka
Manipulate Stream Single JVM in Memory Spark Streaming
Hold Data in Buffer MongoDB Pluggable
Interact with Data Custom Code Pluggable

Technology Cast
• The Stream - Kafka, Kinesis, JMS
• Processing Fabric - Spark Streaming
• Landing Area - MemSQL, Solr, Kudu,
Others

How it looks

With the rest of the app

Scale Out

Demo
• Twitter Producer
• Spark Streaming
• MemSQL & Solr
Sinks

Benefits
• Contextual Expressiveness with Streaming
Data
• Independent scalability (scale-up, scale-
around)
• Expressiveness powered by Spark -- using
Windowing (dataframe API with stream)
• DR COOP, other Data management concerns

Future Work
• Cross stream synchronization & fusion
• On-demand scale out and resource
management via Mesos
• Schema evolution
• More extensible landing strategies

Thanks

Ad

Recommended

PDF

Spark and Bloomberg by Sudarshan Kadambi and Partha NageswaranSpark Summit

PDF

Scalable And Incremental Data Profiling With SparkJen Aman

PDF

Operationalizing Big Data Pipelines At ScaleDatabricks

PDF

Spark Summit EU talk by Bas GeerdinkSpark Summit

PPTX

Lambda architecture: from zero to OneSerg Masyutin

PPTX

Future of data visualizationhadoopsphere

PDF

Apache HBase WorkshopValerii Moisieienko

PDF

Spark Summit EU talk by Stephan KesslerSpark Summit

PDF

Observability for Data Pipelines With OpenLineageDatabricks

PDF

An Introduction to Sparkling Water by Michal MalohlavaSpark Summit

PPTX

Lambda architecture with SparkVincent GALOPIN

PPTX

Real Time Machine Learning Visualization With SparkChester Chen

PDF

From Batch to Streaming ET(L) with Apache ApexDataWorks Summit

PDF

A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit

PDF

Efficiently Building Machine Learning Models for Predictive Maintenance in th...Databricks

PDF

Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...Spark Summit

PDF

Machine Learning Data Lineage with MLflow and Delta LakeDatabricks

PDF

Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowDatabricks

PDF

MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks

PPTX

Using Visualization to Succeed with Big Data Pactera_US

PDF

ETL Made Easy with Azure Data Factory and Azure DatabricksDatabricks

PDF

End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks

PPTX

Presto: Distributed sql query engine kiran palaka

PDF

Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit

PDF

Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Databricks

PDF

Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks

PDF

How Spark Fits into Baidu's Scale-(James Peng, Baidu)Spark Summit

PPTX

Kappa Architecture on Apache Kafka and Querona: datamass.ioPiotr Czarnas

PPTX

Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird

PDF

Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Spark Summit

More Related Content

What's hot (20)

PDF

Observability for Data Pipelines With OpenLineageDatabricks

PDF

An Introduction to Sparkling Water by Michal MalohlavaSpark Summit

PPTX

Lambda architecture with SparkVincent GALOPIN

PPTX

Real Time Machine Learning Visualization With SparkChester Chen

PDF

From Batch to Streaming ET(L) with Apache ApexDataWorks Summit

PDF

A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit

PDF

Efficiently Building Machine Learning Models for Predictive Maintenance in th...Databricks

PDF

Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...Spark Summit

PDF

Machine Learning Data Lineage with MLflow and Delta LakeDatabricks

PDF

Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowDatabricks

PDF

MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks

PPTX

Using Visualization to Succeed with Big Data Pactera_US

PDF

ETL Made Easy with Azure Data Factory and Azure DatabricksDatabricks

PDF

End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks

PPTX

Presto: Distributed sql query engine kiran palaka

PDF

Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit

PDF

Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Databricks

PDF

Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks

PDF

How Spark Fits into Baidu's Scale-(James Peng, Baidu)Spark Summit

PPTX

Kappa Architecture on Apache Kafka and Querona: datamass.ioPiotr Czarnas

Observability for Data Pipelines With OpenLineageDatabricks

An Introduction to Sparkling Water by Michal MalohlavaSpark Summit

Lambda architecture with SparkVincent GALOPIN

Real Time Machine Learning Visualization With SparkChester Chen

From Batch to Streaming ET(L) with Apache ApexDataWorks Summit

A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit

Efficiently Building Machine Learning Models for Predictive Maintenance in th...Databricks

Digital Attribution Modeling Using Apache Spark-(Anny Chen and William Yan, A...Spark Summit

Machine Learning Data Lineage with MLflow and Delta LakeDatabricks

Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowDatabricks

MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks

Using Visualization to Succeed with Big Data Pactera_US

ETL Made Easy with Azure Data Factory and Azure DatabricksDatabricks

End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks

Presto: Distributed sql query engine kiran palaka

Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit

Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL...Databricks

Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks

How Spark Fits into Baidu's Scale-(James Peng, Baidu)Spark Summit

Kappa Architecture on Apache Kafka and Querona: datamass.ioPiotr Czarnas

Viewers also liked (20)

PPTX

Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird

PDF

Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Spark Summit

PDF

Manual de programacion_con_robots_para_la_escuelaAngel De las Heras

PDF

Interactive Visualization of Streaming Data Powered by SparkSpark Summit

PDF

Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Spark Summit

PDF

Big Data visualization with Apache Spark and Zeppelinprajods

PDF

Data viz as interface #ignitelondon7Makoto Inoue

PPTX

Flume HBaseirayan

PDF

Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Lucidworks

PDF

DataEngConf: Apache Spark in Financial Modeling at BlackRock Hakka Labs

PDF

Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Spark Summit

PDF

Reactive Streams, Linking Reactive Application To Spark StreamingSpark Summit

PDF

Solr As A SparkSQL DataSourceSpark Summit

PDF

ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...Spark Summit

PDF

Pregel: A System for Large-Scale Graph ProcessingChris Bunch

PPTX

Next Generation of BIIhor Malytskyi

PDF

Spark Summit EU talk by Sudeep Das and Aish FaentonSpark Summit

PPTX

SORT & JOIN IN SPARK 2.0Sigmoid

PDF

Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Spark Summit

PDF

Apps to spark memoryUniversity of Southern Queensland

Real time data viz with Spark Streaming, Kafka and D3.jsBen Laird

Visualizing AutoTrader Traffic in Near Real-Time with Spark Streaming-(Jon Gr...Spark Summit

Manual de programacion_con_robots_para_la_escuelaAngel De las Heras

Interactive Visualization of Streaming Data Powered by SparkSpark Summit

Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Spark Summit

Big Data visualization with Apache Spark and Zeppelinprajods

Data viz as interface #ignitelondon7Makoto Inoue

Flume HBaseirayan

Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Lucidworks

DataEngConf: Apache Spark in Financial Modeling at BlackRock Hakka Labs

Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Spark Summit

Reactive Streams, Linking Reactive Application To Spark StreamingSpark Summit

Solr As A SparkSQL DataSourceSpark Summit

ggplot2.SparkR: Rebooting ggplot2 for Scalable Big Data Visualization by Jong...Spark Summit

Pregel: A System for Large-Scale Graph ProcessingChris Bunch

Next Generation of BIIhor Malytskyi

Spark Summit EU talk by Sudeep Das and Aish FaentonSpark Summit

SORT & JOIN IN SPARK 2.0Sigmoid

Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Spark Summit

Apps to spark memoryUniversity of Southern Queensland

Ad

Similar to Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farchtchi (20)

PDF

Spark meetup - Zoomdata StreamingZoomdata

PDF

Streaming VisualizationGuido Schmutz

PDF

Streaming VisualizationGuido Schmutz

PDF

Streaming VisualizationGuido Schmutz

PDF

Streaming VisualizationGuido Schmutz

PDF

Streaming VisualisationGuido Schmutz

PDF

Streaming VisualizationGuido Schmutz

PDF

Data Streaming For Big DataSeval Çapraz

PDF

Building end to end streaming application on Sparkdatamantra

PDF

SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiCodemotion Dubai

PPTX

Shikha fdp 62_14july2017Dr. Shikha Mehta

PDF

Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz

PPTX

Data streaming fundamentalsMohammed Fazuluddin

PDF

Spark streaming state of the unionDatabricks

PDF

[Spark meetup] Spark Streaming OverviewStratio

PDF

NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson

PDF

Introduction to Spark Streamingdatamantra

PPTX

Leveraging Azure Databricks to minimize time to insight by combining Batch an...Microsoft Tech Community

PDF

Interactive Data Analysis in Spark Streamingdatamantra

PPTX

SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData

Spark meetup - Zoomdata StreamingZoomdata

Streaming VisualizationGuido Schmutz

Streaming VisualizationGuido Schmutz

Streaming VisualizationGuido Schmutz

Streaming VisualizationGuido Schmutz

Streaming VisualisationGuido Schmutz

Streaming VisualizationGuido Schmutz

Data Streaming For Big DataSeval Çapraz

Building end to end streaming application on Sparkdatamantra

SMACK Stack - Fast Data Done Right by Stefan Siprell at Codemotion DubaiCodemotion Dubai

Shikha fdp 62_14july2017Dr. Shikha Mehta

Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz

Data streaming fundamentalsMohammed Fazuluddin

Spark streaming state of the unionDatabricks

[Spark meetup] Spark Streaming OverviewStratio

NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson

Introduction to Spark Streamingdatamantra

Leveraging Azure Databricks to minimize time to insight by combining Batch an...Microsoft Tech Community

Interactive Data Analysis in Spark Streamingdatamantra

SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData

Ad

More from Spark Summit (20)

PDF

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit

PDF

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit

PDF

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit

PDF

Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit

PDF

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit

PDF

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit

PDF

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

PDF

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

PDF

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit

PDF

Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit

PDF

Powering a Startup with Apache Spark with Kevin KimSpark Summit

PDF

Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit

PDF

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit

PDF

How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit

PDF

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit

PDF

Goal Based Data Production with Sim SimeonovSpark Summit

PDF

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit

PDF

Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit

PDF

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit

PDF

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit

VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit

Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit

A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit

No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit

MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit

Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit

Powering a Startup with Apache Spark with Kevin KimSpark Summit

Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit

Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit

How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit

Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit

Goal Based Data Production with Sim SimeonovSpark Summit

Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit

Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit

Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit

MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit

Recently uploaded (20)

PPT

Real Life Application of Set theory, Relations and Functionsmanavparmar205

PDF

Blitz Campinas - Dia 24 de maio - Piettro.pdffabigreek

PDF

An Uncut Conversation With Grok | PDF DocumentMike Hydes

PPTX

M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptxteodoroferiarevanojr

PPTX

Nursing Shift Supervisor 24/7 in a week .pptxamjadtanveer

PPT

From Vision to Reality: The Digital India RevolutionHarsh Bharvadiya

PDF

717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...pedelli41

PPTX

Probability systematic sampling methods.pptxPrakashRajput19

PPTX

short term project on AI Driven Data AnalyticsJMJCollegeComputerde

PPTX

Insurance-Analytics-Branch-Dashboard (1).pptxtrivenisapate02

PPTX

World-population.pptx fire bunberbpeopleumutunsalnsl4402

PPTX

Customer Segmentation: Seeing the Trees and the Forest SimultaneouslySione Palu

PPTX

Introduction to computer chapter one 2017.pptxmensunmarley

PPTX

MR and reffffffvvvvvvvfversal_083605.pptxmanjeshjain

PDF

apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...apidays

PPTX

UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the topchinnusindhu1

PPTX

Introduction to Data Analytics and Data ScienceKavithaCIT

PPTX

Introduction-to-Python-Programming-Language (1).pptxdhyeysapariya

PDF

202501214233242351219 QASS Session 2.pdflauramejiamillan

PPTX

short term internship project on Data visualizationJMJCollegeComputerde

Real Life Application of Set theory, Relations and Functionsmanavparmar205

Blitz Campinas - Dia 24 de maio - Piettro.pdffabigreek

An Uncut Conversation With Grok | PDF DocumentMike Hydes

M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptxteodoroferiarevanojr

Nursing Shift Supervisor 24/7 in a week .pptxamjadtanveer

From Vision to Reality: The Digital India RevolutionHarsh Bharvadiya

717629748-Databricks-Certified-Data-Engineer-Professional-Dumps-by-Ball-21-03...pedelli41

Probability systematic sampling methods.pptxPrakashRajput19

short term project on AI Driven Data AnalyticsJMJCollegeComputerde

Insurance-Analytics-Branch-Dashboard (1).pptxtrivenisapate02

World-population.pptx fire bunberbpeopleumutunsalnsl4402

Customer Segmentation: Seeing the Trees and the Forest SimultaneouslySione Palu

Introduction to computer chapter one 2017.pptxmensunmarley

MR and reffffffvvvvvvvfversal_083605.pptxmanjeshjain

apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...apidays

UVA-Ortho-PPT-Final-1.pptx Data analytics relevant to the topchinnusindhu1

Introduction to Data Analytics and Data ScienceKavithaCIT

Introduction-to-Python-Programming-Language (1).pptxdhyeysapariya

202501214233242351219 QASS Session 2.pdflauramejiamillan

short term internship project on Data visualizationJMJCollegeComputerde

Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farchtchi

1. INTERACTIVE VISUALIZATION OF STREAMING DATA POWERED BY SPARK

2. Streaming @Zoomdata Visualizations react to new data delivered Users start, stop, pause the stream Users select a rolling window or pin a start time to capture cumulative metrics

3. Drivers for Streaming Data Data Freshness Time to Analytic Business Context

4. Challenges • Time • Frequency • Retention • Synchronization • Order • Updates

5. Addressing streaming @Zoomdata Historical Revised Receive Data JMS Kafka Manipulate Stream Single JVM in Memory Spark Streaming Hold Data in Buffer MongoDB Pluggable Interact with Data Custom Code Pluggable

6. Technology Cast • The Stream - Kafka, Kinesis, JMS • Processing Fabric - Spark Streaming • Landing Area - MemSQL, Solr, Kudu, Others

7. How it looks

8. With the rest of the app

10. Demo • Twitter Producer • Spark Streaming • MemSQL & Solr Sinks

11. Benefits • Contextual Expressiveness with Streaming Data • Independent scalability (scale-up, scale- around) • Expressiveness powered by Spark -- using Windowing (dataframe API with stream) • DR COOP, other Data management concerns

12. Future Work • Cross stream synchronization & fusion • On-demand scale out and resource management via Mesos • Schema evolution • More extensible landing strategies