SlideShare a Scribd company logo
Apache Cassandra and Python
For streaming Big Data
Prajod S Vettiyattil
Architect, Wipro
@prajods
https://in.linkedin.com/in/prajod
Nishant Sahay
Architect, Wipro
@nsahaytech
https://in.linkedin.com/in/nishantsahay
1
Open Source India
Nov 2015
Database track
Agenda
1. Time Series Data Analysis
2. Spark, Python, Cassandra and D3
3. Business problem
4. Solution using Logical Architecture
5. Data Processor
6. Data Persistence
7. Data Visualization
2
What this session is about
3
What
Big Data
Streaming
Time
Series
How
Spark
Python
Cassandra
D3.js,
Node.js
Tools: Python, Spark, Cassandra, Node and D3
• Python and Spark for Big data processing
• Cassandra for persistence and serving
• D3 for visualization
• Node for
• Enabling scalability
• Data aggregation
4
python
• Popular with Open source projects
• Wide support base
• Strong in data science
• Visualization libraries
• Statistics functions
5
Cassandra
• noSQL database
• Column family
• Dynamic columns
• AP in CAP theorem
• Tunable consistency
• Suited for time series storage
6
D3.js
• Data driven documents
• SVG, html, css and javascript
• Fine grained control of screen elements
• Plethora of UI widgets
7
Business Problem
•Handle streaming data
•Stock ticks
•Weather movements
•Satellite captures
•Astronomical observations
•Large Hadron Collider
•Ingest
•Persist
•Visualize
•Analysing stock prices
8
Logical Solution Architecture
Time Series
Data Producer
(IoT devices, Stock ticks)
Data Processor
(pySpark)
Data
Persistence
(Cassandra)
Visualization
Aggregator
(Node.js)
Visualization
(D3.js)
9
Data Processor: pySpark
•Apache Spark is a big data processor
•Streaming data
•Batch data
•Lambda architecture
•pySpark for using python’s power on top of Spark
•python
•Machine learning
•Statistics
•Visualization
•Cassandra integration
•pyspark-cassandra adapter from TargetHoldings
10
Logical Architecture diagram of Spark
Apache Spark
Spark
SQL
MLlib GraphX SparkR pySpark
11
Spark
Streaming
Apache Spark: Core
• In memory processing for Big Data
• Cached intermediate data sets
• Multi-step DAG based execution
• Resilient Distributed Data(RDD) sets
12
pySpark and Cassandra
Java
Python
Cassandra
13
Apache Spark: Processing stock ticks
• Ingest stock tick stream, coming in at a high rate
• Calculate moving average of stock prices
• Insert the average of prices into Cassandra
14
Data Persistence - Cassandra
• Master less: Peer to peer
• Built to Scale: Scales to support millions of operations per second
• High Availability: No single point of failure
• Ease of Use: Operational simplicity, CQL for developers
• It is supposedly battle tested at Facebook, Apple and Netflix :-)
15
Data Persistence - Cassandra
16
n1
n5
n2
n4
n3n7
n8
n6
Write Request -
Partition Key Hash value for n1
n8 – Coordinator Node
n1 – Primary responsible node handling
request
n2, n3 – Replication Nodes (RF=3)
Cassandra Data Model – Skinny Rows
Skinny Rows:
Primary Key with only partition key
CREATE TABLE stock_info(stock_id text, date text, price double, PRIMARY KEY
((stock_id, date));
stock_id date price
GAZP 2015-11-11 556.50
GAZP 2015-11-10 556.65
GAZP:2015-11-11
price
556.50
GAZP:2015-11-10
price
556.65
17
Composite Partition Key
Logical View Disc View
Node n1
Node n4
Cassandra Data Model – Wide Rows
Wide Rows
Primary key contains column (Clustering Columns) other than the
partition key.
CREATE TABLE stock_ticker(stock_id text, price double, event_time timestamp ,
PRIMARY KEY (stock_id, event_time);
GAZP
2015-11-10
13:30:00:price
556.45
2015-11-10
09:30:00:price
559.45
stock_
id
price date event_time
GAZP 559.45 2015-11-10 2015-11-10
09:30:00
GAZP 556.45 2015-11-10 2015-11-10
13:30:00
GAZP 556.65 2015-11-11 2015-11-11
18:00:00
2015-11-11
16:00:00:price
556.65
18
Logical View Disc View
Compound Primary Key (Partition+Clustering)
Node n1
Time Series – Cassandra Data Model
Wide Row + Row Partition
CREATE TABLE stock_info(stock_id text, date text, price double, event_time
timestamp, PRIMARY KEY ((stock_id, date), event_time);
stock_id price date event_time
GAZP 559.45 2015-11-10 2015-11-10
09:30:00
GAZP 556.45 2015-11-10 2015-11-10
13:30:00
GAZP 556.65 2015-11-11 2015-11-11
18:00:00
GAZP:2015-11-10
2015-11-10
13:30:00:price
556.45
2015-11-10
09:30:00:price
559.45
GAZP:2015-11-11
2015-11-11
18:00:00:price
556.65
19
Logical View Disc View
Node n1
Node n6
Summary – Cassandra Data Model
Skinny Row
Wide Row
Wide Row + Row Partition
Optimize with Expiring
Columns/Split day bucket to
multiple rows
20
GAZP:2015-11-10
2015-11-10 13:30:00:price
556.45
2015-11-10 09:30:00:price
559.45
GAZP:2015-11-11
2015-11-11 18:00:00:price
556.65
Node n1
Node n6
GAZP
2015-11-10
13:30:00:price
556.45
2015-11-10
09:30:00:price
559.45
2015-11-11
16:00:00:price
556.65
Node n1
GAZP:2015-11-11
price
556.50
GAZP:2015-11-10
price
556.65
Node n1
Node n4
Node.js, Cassandra and D3.js
D3.js graph
Browser
Web UI Layer
ExpressJS
cassandra-
driver
Server Layer Database Layer
Cassandra
DB
Rest Based
Polling
Get JSON
Data
CQL – Select
Time Series
Data
21
Data Aggregator
• Node.js is proxy for data aggregation
• Expose Rest endpoint for visualization
• Retrieve data from Cassandra
• Data transformation as per business need
• ExpressJS: Flexible web application framework
• Datastax cassandra-driver: client library for Apache Cassandra
• EJS: For quick templating of on-the-fly node application
22
Visualization - Frameworks
• D3 for transformation of time series data into visual information
• Consume REST API
• Generate customized data driven graphs and visualization
• Rickshaw is a JavaScript toolkit for creating interactive time series
graphs
• Built on D3.js
• Generate time-series graph
23
Visualization – Graphs
2424
Price
Moving Average
Trade Volume
Stock Price
Summary
• Processing time series data
• Apache Spark
• Cassandra
• Node.js
• D3.js
25
QUESTIONS
Prajod S Vettiyattil
Architect, Wipro
@prajods
https://in.linkedin.com/in/prajod
Nishant Sahay
Architect, Wipro
@nsahaytech
https://in.linkedin.com/in/nishantsahay

More Related Content

What's hot (20)

PDF
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
DataStax Academy
 
PDF
Spark with Cassandra by Christopher Batey
Spark Summit
 
PDF
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Natalino Busa
 
PPTX
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Matthias Niehoff
 
PDF
Time series with Apache Cassandra - Long version
Patrick McFadin
 
PDF
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
PDF
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Helena Edelson
 
PPTX
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
PDF
Apache Spark and DataStax Enablement
Vincent Poncet
 
PDF
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Summit
 
PDF
Spark Cassandra Connector Dataframes
Russell Spitzer
 
PDF
Spark cassandra connector.API, Best Practices and Use-Cases
Duyhai Doan
 
PPTX
Kafka Lambda architecture with mirroring
Anant Rustagi
 
PDF
The How and Why of Fast Data Analytics with Apache Spark
Legacy Typesafe (now Lightbend)
 
PDF
Spark Cassandra Connector: Past, Present, and Future
Russell Spitzer
 
PDF
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Duyhai Doan
 
PDF
SMACK Stack 1.1
Joe Stein
 
PDF
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
PDF
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
Miklos Christine
 
PDF
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Evan Chan
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
DataStax Academy
 
Spark with Cassandra by Christopher Batey
Spark Summit
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Natalino Busa
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Matthias Niehoff
 
Time series with Apache Cassandra - Long version
Patrick McFadin
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Helena Edelson
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
Apache Spark and DataStax Enablement
Vincent Poncet
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Summit
 
Spark Cassandra Connector Dataframes
Russell Spitzer
 
Spark cassandra connector.API, Best Practices and Use-Cases
Duyhai Doan
 
Kafka Lambda architecture with mirroring
Anant Rustagi
 
The How and Why of Fast Data Analytics with Apache Spark
Legacy Typesafe (now Lightbend)
 
Spark Cassandra Connector: Past, Present, and Future
Russell Spitzer
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Duyhai Doan
 
SMACK Stack 1.1
Joe Stein
 
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
Miklos Christine
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Evan Chan
 

Similar to Apache Cassandra and Python for Analyzing Streaming Big Data (20)

PDF
Managing data analytics in a hybrid cloud
Karan Singh
 
PDF
Big Telco Real-Time Network Analytics
Yousun Jeong
 
PDF
Big Telco - Yousun Jeong
Spark Summit
 
PPTX
High performance Spark distribution on PKS by SnappyData
VMware Tanzu
 
PPTX
High performance Spark distribution on PKS by SnappyData
Carlos Andrés García
 
PDF
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Chetan Khatri
 
PDF
Real-Time Analytics with Apache Cassandra and Apache Spark,
Swiss Data Forum Swiss Data Forum
 
PDF
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
PPTX
Cassandra Lunch #89: Semi-Structured Data in Cassandra
Anant Corporation
 
PPTX
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
PDF
How Cloudflare analyzes -1m dns queries per second @ Percona E17
Tom Arnfeld
 
PDF
Analytics with Cassandra & Spark
Matthias Niehoff
 
PDF
New Developments in Spark
Databricks
 
PDF
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
PDF
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 
PDF
SnappyData Toronto Meetup Nov 2017
SnappyData
 
PPTX
The Future of Hadoop: A deeper look at Apache Spark
Cloudera, Inc.
 
PPTX
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
PDF
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData
 
PDF
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Managing data analytics in a hybrid cloud
Karan Singh
 
Big Telco Real-Time Network Analytics
Yousun Jeong
 
Big Telco - Yousun Jeong
Spark Summit
 
High performance Spark distribution on PKS by SnappyData
VMware Tanzu
 
High performance Spark distribution on PKS by SnappyData
Carlos Andrés García
 
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Chetan Khatri
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Swiss Data Forum Swiss Data Forum
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
Cassandra Lunch #89: Semi-Structured Data in Cassandra
Anant Corporation
 
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
How Cloudflare analyzes -1m dns queries per second @ Percona E17
Tom Arnfeld
 
Analytics with Cassandra & Spark
Matthias Niehoff
 
New Developments in Spark
Databricks
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
Jump Start on Apache Spark 2.2 with Databricks
Anyscale
 
SnappyData Toronto Meetup Nov 2017
SnappyData
 
The Future of Hadoop: A deeper look at Apache Spark
Cloudera, Inc.
 
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData
 
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Ad

More from prajods (7)

PDF
Big Data visualization with Apache Spark and Zeppelin
prajods
 
PDF
Event Driven Architecture with Apache Camel
prajods
 
PDF
RedHat MRG and Infinispan for Large Scale Integration
prajods
 
PDF
Apache Spark: The Next Gen toolset for Big Data Processing
prajods
 
PDF
JUDCon 2014: Gearing up for mobile development with AeroGear
prajods
 
PDF
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
prajods
 
PPTX
Apache Camel: The Swiss Army Knife of Open Source Integration
prajods
 
Big Data visualization with Apache Spark and Zeppelin
prajods
 
Event Driven Architecture with Apache Camel
prajods
 
RedHat MRG and Infinispan for Large Scale Integration
prajods
 
Apache Spark: The Next Gen toolset for Big Data Processing
prajods
 
JUDCon 2014: Gearing up for mobile development with AeroGear
prajods
 
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
prajods
 
Apache Camel: The Swiss Army Knife of Open Source Integration
prajods
 
Ad

Recently uploaded (20)

PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
BinarySearchTree in datastructures in detail
kichokuttu
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 

Apache Cassandra and Python for Analyzing Streaming Big Data

  • 1. Apache Cassandra and Python For streaming Big Data Prajod S Vettiyattil Architect, Wipro @prajods https://in.linkedin.com/in/prajod Nishant Sahay Architect, Wipro @nsahaytech https://in.linkedin.com/in/nishantsahay 1 Open Source India Nov 2015 Database track
  • 2. Agenda 1. Time Series Data Analysis 2. Spark, Python, Cassandra and D3 3. Business problem 4. Solution using Logical Architecture 5. Data Processor 6. Data Persistence 7. Data Visualization 2
  • 3. What this session is about 3 What Big Data Streaming Time Series How Spark Python Cassandra D3.js, Node.js
  • 4. Tools: Python, Spark, Cassandra, Node and D3 • Python and Spark for Big data processing • Cassandra for persistence and serving • D3 for visualization • Node for • Enabling scalability • Data aggregation 4
  • 5. python • Popular with Open source projects • Wide support base • Strong in data science • Visualization libraries • Statistics functions 5
  • 6. Cassandra • noSQL database • Column family • Dynamic columns • AP in CAP theorem • Tunable consistency • Suited for time series storage 6
  • 7. D3.js • Data driven documents • SVG, html, css and javascript • Fine grained control of screen elements • Plethora of UI widgets 7
  • 8. Business Problem •Handle streaming data •Stock ticks •Weather movements •Satellite captures •Astronomical observations •Large Hadron Collider •Ingest •Persist •Visualize •Analysing stock prices 8
  • 9. Logical Solution Architecture Time Series Data Producer (IoT devices, Stock ticks) Data Processor (pySpark) Data Persistence (Cassandra) Visualization Aggregator (Node.js) Visualization (D3.js) 9
  • 10. Data Processor: pySpark •Apache Spark is a big data processor •Streaming data •Batch data •Lambda architecture •pySpark for using python’s power on top of Spark •python •Machine learning •Statistics •Visualization •Cassandra integration •pyspark-cassandra adapter from TargetHoldings 10
  • 11. Logical Architecture diagram of Spark Apache Spark Spark SQL MLlib GraphX SparkR pySpark 11 Spark Streaming
  • 12. Apache Spark: Core • In memory processing for Big Data • Cached intermediate data sets • Multi-step DAG based execution • Resilient Distributed Data(RDD) sets 12
  • 14. Apache Spark: Processing stock ticks • Ingest stock tick stream, coming in at a high rate • Calculate moving average of stock prices • Insert the average of prices into Cassandra 14
  • 15. Data Persistence - Cassandra • Master less: Peer to peer • Built to Scale: Scales to support millions of operations per second • High Availability: No single point of failure • Ease of Use: Operational simplicity, CQL for developers • It is supposedly battle tested at Facebook, Apple and Netflix :-) 15
  • 16. Data Persistence - Cassandra 16 n1 n5 n2 n4 n3n7 n8 n6 Write Request - Partition Key Hash value for n1 n8 – Coordinator Node n1 – Primary responsible node handling request n2, n3 – Replication Nodes (RF=3)
  • 17. Cassandra Data Model – Skinny Rows Skinny Rows: Primary Key with only partition key CREATE TABLE stock_info(stock_id text, date text, price double, PRIMARY KEY ((stock_id, date)); stock_id date price GAZP 2015-11-11 556.50 GAZP 2015-11-10 556.65 GAZP:2015-11-11 price 556.50 GAZP:2015-11-10 price 556.65 17 Composite Partition Key Logical View Disc View Node n1 Node n4
  • 18. Cassandra Data Model – Wide Rows Wide Rows Primary key contains column (Clustering Columns) other than the partition key. CREATE TABLE stock_ticker(stock_id text, price double, event_time timestamp , PRIMARY KEY (stock_id, event_time); GAZP 2015-11-10 13:30:00:price 556.45 2015-11-10 09:30:00:price 559.45 stock_ id price date event_time GAZP 559.45 2015-11-10 2015-11-10 09:30:00 GAZP 556.45 2015-11-10 2015-11-10 13:30:00 GAZP 556.65 2015-11-11 2015-11-11 18:00:00 2015-11-11 16:00:00:price 556.65 18 Logical View Disc View Compound Primary Key (Partition+Clustering) Node n1
  • 19. Time Series – Cassandra Data Model Wide Row + Row Partition CREATE TABLE stock_info(stock_id text, date text, price double, event_time timestamp, PRIMARY KEY ((stock_id, date), event_time); stock_id price date event_time GAZP 559.45 2015-11-10 2015-11-10 09:30:00 GAZP 556.45 2015-11-10 2015-11-10 13:30:00 GAZP 556.65 2015-11-11 2015-11-11 18:00:00 GAZP:2015-11-10 2015-11-10 13:30:00:price 556.45 2015-11-10 09:30:00:price 559.45 GAZP:2015-11-11 2015-11-11 18:00:00:price 556.65 19 Logical View Disc View Node n1 Node n6
  • 20. Summary – Cassandra Data Model Skinny Row Wide Row Wide Row + Row Partition Optimize with Expiring Columns/Split day bucket to multiple rows 20 GAZP:2015-11-10 2015-11-10 13:30:00:price 556.45 2015-11-10 09:30:00:price 559.45 GAZP:2015-11-11 2015-11-11 18:00:00:price 556.65 Node n1 Node n6 GAZP 2015-11-10 13:30:00:price 556.45 2015-11-10 09:30:00:price 559.45 2015-11-11 16:00:00:price 556.65 Node n1 GAZP:2015-11-11 price 556.50 GAZP:2015-11-10 price 556.65 Node n1 Node n4
  • 21. Node.js, Cassandra and D3.js D3.js graph Browser Web UI Layer ExpressJS cassandra- driver Server Layer Database Layer Cassandra DB Rest Based Polling Get JSON Data CQL – Select Time Series Data 21
  • 22. Data Aggregator • Node.js is proxy for data aggregation • Expose Rest endpoint for visualization • Retrieve data from Cassandra • Data transformation as per business need • ExpressJS: Flexible web application framework • Datastax cassandra-driver: client library for Apache Cassandra • EJS: For quick templating of on-the-fly node application 22
  • 23. Visualization - Frameworks • D3 for transformation of time series data into visual information • Consume REST API • Generate customized data driven graphs and visualization • Rickshaw is a JavaScript toolkit for creating interactive time series graphs • Built on D3.js • Generate time-series graph 23
  • 24. Visualization – Graphs 2424 Price Moving Average Trade Volume Stock Price
  • 25. Summary • Processing time series data • Apache Spark • Cassandra • Node.js • D3.js 25
  • 26. QUESTIONS Prajod S Vettiyattil Architect, Wipro @prajods https://in.linkedin.com/in/prajod Nishant Sahay Architect, Wipro @nsahaytech https://in.linkedin.com/in/nishantsahay