SlideShare a Scribd company logo
Cassandra & Spark for IoT_
Matthias Niehoff
Cassandra
2
•Distributed database
•Highly Available
•Horizontal & Linear Scalable
•Multi Datacenter Support
•No Single Point Of Failure
•Chooses Availability Over Strong Consistency
Cassandra for IoT_
3
Node 1
Node 2
Node 3
Node 4
1-25
26-5051-75
76-0
Great for Time Series Data_
4
CREATE	TABLE	sensors(	
sensorId	uuid,	

time	timeuuid,		
metricName	text,	

metricValue	double,	
PRIMARY	KEY(sensorId,	time)	
)
id t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11
Stored sequentially on disk
Spark
5
•Open Source & Apache project since 2010
•Data processing Framework
• Batch processing
• Stream processing
What Is Apache Spark_
6
•Fast
• up to 100 times faster than Hadoop
• a lot of in-memory processing
• linear scalable using more nodes
•Easy
• Scala, Java and Python API
• Clean Code (e.g. with lambdas in Java 8)
• expanded API: map, reduce, filter, groupBy, sort, union, join,
reduceByKey, groupByKey, sample, take, first, count
•Fault-Tolerant
• easily reproducible
Why Use Spark_
7
•RDD‘s – Resilient Distributed Dataset
• Read–Only description of a collection of objects
• Partitioned for distribution
• Determined through transformations
• Allows automatically rebuild on failure
•Operations
• Transformations (map,filter,reduce...) —> new RDD
• Actions (count, collect, save)
•Only Actions start processing!
Easily Reproducable?_
8
RDD Example_
9
scala>	val	textFile	=	sc.textFile("README.md")	
textFile:	spark.RDD[String]	=	spark.MappedRDD@2ee9b6e3	
scala>	val	linesWithSpark	=	textFile.filter(line	=>	
line.contains("Spark"))	
linesWithSpark:	spark.RDD[String]	=	spark.FilteredRDD@7dd4af09	
scala>	linesWithSpark.count()		
res0:	Long	=	126
Spark & Cassandra
10
•Spark Cassandra Connector by Datastax
• https://github.com/datastax/spark-cassandra-connector
•Cassandra tables as Spark RDD (read & write)
•Mapping of C* tables and rows onto Java/Scala objects
•Server-Side filtering („where“)
•Included as Maven / SBT dependency in your application
Connecting Spark With Cassandra_
11
Two Datacenter - Two Purposes_
12
C*
C*
C*C*
C*
C*
C*C*
Spark
WN
Spark
WN
Spark
WN
Spark
WN
Spark
Master
DC1 - Online DC2 - Analytics
Spark Streaming
13
•Real Time Processing using micro batches
•Supported sources: Files, TCP, MQTT, Kafka, Twitter,..
•Data as Discretized Stream (DStream)
•Same programming model as for batches
•All Operations of the Spark Core, SQL and MLLib
•Stateful Operations & Sliding Windows
Stream Processing With Spark Streaming_
14
val	ssc	=	new	StreamingContext(sc,	Milliseconds(500))	
val	lines	=	MQTTUtils.createStream(ssc,	"tcp://localhost:
1883",	"foo",	StorageLevel.MEMORY_ONLY_SER_2)	
val	keyValue	=	lines.map(input	=>	input.toLowerCase)	
data.foreachRDD(_.saveToCassandra("mqtt",	"sensors"))	
ssc.start()	
//	await	manual	termination	or	error	
ssc.awaitTermination()	
//	manual	termination	
ssc.stop()	
Spark Streaming - MQTT Example_
15
Use Cases
16
•Spark Streaming
• Continuous data streams
• MQTT, Kafka, ZeroMQ...
• Easily reliable
•Spark Core
• Existing data
• SQL Databases, CSV, Json...
•Use the same programming model or even the same code!
Use Cases for Spark and Cassandra in IoT_
17
Ingestion
•Real-Time Analysis
• React on events
• Join with existing data
• Apply events on ML models
•Batch Analysis
• Scheduled jobs
• Analytics on the data
• Train ML models
Use Cases for Spark and Cassandra in IoT_
18
Analyses
Demo
19
Questions?
Matthias Niehoff,
IT-Consultant
90
codecentric AG
Zeppelinstraße 2
76185 Karlsruhe, Germany
mobil: +49 (0) 172.1702676
matthias.niehoff@codecentric.de
www.codecentric.de
blog.codecentric.de
matthiasniehoff

More Related Content

What's hot (20)

PDF
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Anton Kirillov
 
PDF
Reactive dashboard’s using apache spark
Rahul Kumar
 
PDF
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
PDF
Spark with Cassandra by Christopher Batey
Spark Summit
 
PDF
Cassandra spark connector
Duyhai Doan
 
PDF
Lambda architecture
Szilveszter Molnár
 
PDF
Apache cassandra & apache spark for time series data
Patrick McFadin
 
PDF
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Databricks
 
PDF
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Summit
 
PDF
Akka in Production - ScalaDays 2015
Evan Chan
 
PPTX
Kafka Lambda architecture with mirroring
Anant Rustagi
 
PDF
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
PPTX
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
PDF
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Santosh Sahoo
 
PDF
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
PDF
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax Academy
 
PDF
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
PPTX
Real time data pipeline with spark streaming and cassandra with mesos
Rahul Kumar
 
PDF
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
PDF
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Data Con LA
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Anton Kirillov
 
Reactive dashboard’s using apache spark
Rahul Kumar
 
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
Spark with Cassandra by Christopher Batey
Spark Summit
 
Cassandra spark connector
Duyhai Doan
 
Lambda architecture
Szilveszter Molnár
 
Apache cassandra & apache spark for time series data
Patrick McFadin
 
Analytical DBMS to Apache Spark Auto Migration Framework with Edward Zhang an...
Databricks
 
Spark Streaming: Pushing the throughput limits by Francois Garillot and Gerar...
Spark Summit
 
Akka in Production - ScalaDays 2015
Evan Chan
 
Kafka Lambda architecture with mirroring
Anant Rustagi
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Brian O'Neill
 
Bellevue Big Data meetup: Dive Deep into Spark Streaming
Santosh Sahoo
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
DataStax and Esri: Geotemporal IoT Search and Analytics
DataStax Academy
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
Real time data pipeline with spark streaming and cassandra with mesos
Rahul Kumar
 
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Data Con LA
 

Similar to Cassandra & Spark for IoT (20)

PDF
Analytics with Cassandra & Spark
Matthias Niehoff
 
PPTX
Jump Start with Apache Spark 2.0 on Databricks
Databricks
 
PDF
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Helena Edelson
 
PPTX
Spark + Cassandra = Real Time Analytics on Operational Data
Victor Coustenoble
 
PDF
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
PDF
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Spark Summit
 
PDF
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
PPTX
BigData Developers MeetUp
Christian Johannsen
 
PDF
Real-Time Analytics with Apache Cassandra and Apache Spark,
Swiss Data Forum Swiss Data Forum
 
PPTX
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Lviv Startup Club
 
PDF
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
PPTX
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
PDF
Sa introduction to big data pipelining with cassandra & spark west mins...
Simon Ambridge
 
PPTX
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
PPTX
Intro to Spark
Kyle Burke
 
PPTX
Spark Introduction
DataStax Academy
 
PPTX
5 Ways to Use Spark to Enrich your Cassandra Environment
Jim Hatcher
 
PDF
Chicago Kafka Meetup
Cliff Gilmore
 
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Analytics with Cassandra & Spark
Matthias Niehoff
 
Jump Start with Apache Spark 2.0 on Databricks
Databricks
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Helena Edelson
 
Spark + Cassandra = Real Time Analytics on Operational Data
Victor Coustenoble
 
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
Kafka spark cassandra webinar feb 16 2016
Hiromitsu Komatsu
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Spark Summit
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Databricks
 
BigData Developers MeetUp
Christian Johannsen
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Swiss Data Forum Swiss Data Forum
 
Vitalii Bondarenko - “Azure real-time analytics and kappa architecture with K...
Lviv Startup Club
 
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
Paris Data Geek - Spark Streaming
Djamel Zouaoui
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Simon Ambridge
 
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Intro to Spark
Kyle Burke
 
Spark Introduction
DataStax Academy
 
5 Ways to Use Spark to Enrich your Cassandra Environment
Jim Hatcher
 
Chicago Kafka Meetup
Cliff Gilmore
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Ad

Recently uploaded (20)

PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
PDF
Troubleshooting Virtual Threads in Java!
Tier1 app
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
PDF
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PDF
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
Troubleshooting Virtual Threads in Java!
Tier1 app
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
Ad

Cassandra & Spark for IoT