SlideShare a Scribd company logo
Big Data Analytics mit
Spark & Cassandra_
JUG Stuttgart 01/2016
Matthias Niehoff
•Cassandra
•Spark
•Spark & Cassandra
•Spark Applications
•Spark Streaming
•Spark SQL
•Spark MLLib
Agenda_
2
Cassandra
3
•Distributed database
•Highly Available
•Linear Scalable
•Multi Datacenter Support
•No Single Point Of Failure
•CQL Query Language
• Similiar to SQL
• No Joins and aggregates
•Eventual Consistency „Tunable Consistency“
Cassandra_
4
Distributed Data Storage_
5
Node 1
Node 2
Node 3
Node 4
1-25
26-5051-75
76-0
CQL - Querying Language With Limitations_
6
SELECT	*	FROM	performer	WHERE	name	=	'ACDC'	
—>	ok	
SELECT	*	FROM	performer	WHERE	name	=	'ACDC'	and	country	=	
'Australia'	
—>	not	ok	
SELECT	country,	COUNT(*)	as	quantity	FROM	artists	GROUP	BY	
country	ORDER	BY	quantity	DESC	
—>	not	supported	
performer
name (PK)
genre
country
Spark
7
•Open Source & Apache project since 2010
•Data processing Framework
• Batch processing
• Stream processing
What Is Apache Spark_
8
•Fast
• up to 100 times faster than Hadoop
• a lot of in-memory processing
• linear scalable using more nodes
•Easy
• Scala, Java and Python API
• Clean Code (e.g. with lambdas in Java 8)
• expanded API: map, reduce, filter, groupBy, sort, union, join,
reduceByKey, groupByKey, sample, take, first, count
•Fault-Tolerant
• easily reproducible
Why Use Spark_
9
•RDD‘s – Resilient Distributed Dataset
• Read–Only description of a collection of objects
• Distributed among the cluster (on memory or disk)
• Determined through transformations
• Allows automatically rebuild on failure
•Operations
• Transformations (map,filter,reduce...) —> new RDD
• Actions (count, collect, save)
•Only Actions start processing!
Easily Reproducable?_
10
•Partitions
• Describes the Partitions (i.e. one per Cassandra Partition)
•Dependencies
• dependencies on parent RDD’s
•Compute
• The function to compute the RDD’s partitions
•(Optional) Partitioner
• How is the data partitioned? (Hash, Range..)
•(Optional) Preferred Location
• Where to get the data (i.e. List of Cassandra Node IP’s)
Properties Of A RDD_
11
RDD Example_
12
scala>	val	textFile	=	sc.textFile("README.md")	
textFile:	spark.RDD[String]	=	spark.MappedRDD@2ee9b6e3	
scala>	val	linesWithSpark	=	textFile.filter(line	=>	
line.contains("Spark"))	
linesWithSpark:	spark.RDD[String]	=	spark.FilteredRDD@7dd4af09	
scala>	linesWithSpark.count()		
res0:	Long	=	126
Reproduce RDD’s Using A Tree_
13
Datenquelle
rdd1
rdd3
val1 rdd5
rdd2
rdd4
val2
rdd6
val3
map(..)filter(..)
union(..)
count()
count() count()
sample(..)
cache()
•Transformations
• map, flatMap
• sample, filter, distinct
• union, intersection, cartesian
•Actions
• reduce
• count
• collect,first, take
• saveAsTextFile
• foreach
Spark Transformations & Actions_
14
Run Spark In A Cluster_
15
•Memory
• A lot of data in memory
• More memory —> Less disk IO —> Faster processing
• Minimum 8 GB / Node
•Network
• Communication between Driver, Cluster Manager & Worker
• Important for reduce operations
• 10 Gigabit LAN or better
•CPU
• Less communication between threads
• Good to parallelize
• Minimum 8 – 16 Cores / Node
What About Hardware?_
16
•Master Web UI (8080)
How To Monitor? (1/3)_
17
•Worker Web UI (8081)
How To Monitor? (2/3)_
18
•Application Web UI (4040)
How To Monitor? (3/3)_
19
([atomic,collection,object]	,	[atomic,collection,object])	
val	fluege	=		
List(	("Thomas",	"Berlin"),("Mark",	"Paris"),("Thomas",	"Madrid"))	
val	pairRDD	=	sc.parallelize(fluege)	
pairRDD.filter(_._1	==	"Thomas")	
.collect	
.foreach(t	=>	println(t._1	+	"	flog	nach	"	+	t._2))	
Pair RDDs_
20
key – not unique value
•Parallelization!
• keys are use for partitioning
• pairs with different keys are distributed across the cluster
•Efficient processing of
• aggregate by key
• group by key
• sort by key
• joins, union based on keys
Why Use Pair RDD’s_
21
RDD Dependencies_
22
„Narrow“ (pipeline-able)
map, filter
union
join on co partitioned data
RDD Dependencies_
23
„Wide“ (shuffle)
groupBy
on non partitioned data join on non co partitioned data
Spark Demo
24
Spark & Cassandra
25
Use Spark And Cassandra In A Cluster_
26
Spark	

Client
Spark
Driver
C*
C*
C*C*
Spark
WN
Spark
WN
Spark
WN
Spark
WN
Spark
Master
Two Datacenter - Two Purposes_
27
C*
C*
C*C*
C*
C*
C*C*
Spark
WN
Spark
WN
Spark
WN
Spark
WN
Spark
Master
DC1 - Online DC2 - Analytics
•Spark Cassandra Connector by Datastax
• https://github.com/datastax/spark-cassandra-connector
•Cassandra tables as Spark RDD (read & write)
•Mapping of C* tables and rows onto Java/Scala objects
•Server-Side filtering („where“)
•Compatible with
• Spark ≥ 0.9
• Cassandra ≥ 2.0
•Clone & Compile with SBT or download at Maven Central
Connecting Spark With Cassandra_
28
•Start the shell
bin/spark-shell	

--jars	~/path/to/jar/spark-cassandra-connector-
assembly-1.3.0.jar	

--conf	spark.cassandra.connection.host=localhost	
•Import	Cassandra	Classes	
scala>	import	com.datastax.spark.connector._	
Use The Connector In The Shell_
29
•Read complete table
val	movies	=	sc.cassandraTable("movie","movies")	
//	returns	CassandraRDD[CassandraRow]	
•Read selected columns
val	movies	=	sc.cassandraTable("movie","movies").select("title","year")	
•Filter rows
val	movies	=	sc.cassandraTable("movie","movies").where("title	=	'Die	
Hard'")	
•Access Columns in Result Set
movies.collect.foreach(r	=>	println(r.get[String]("title")))	
Read A Cassandra Table_
30
Read As Tuple
val	movies	=	

sc.cassandraTable[(String,Int)]("movie","movies")

.select("title","year")	
val	movies	=	

sc.cassandraTable("movie","movies")

.select("title","year")

.as((_:	String,	_:Int))	
//	both	result	in	a	CassandraRDD[(String,Int)]	
Read A Cassandra Table_
31
Read As Case Class
case	class	Movie(title:	String,	year:	Int)	
sc.cassandraTable[Movie]("movie","movies").select("title","year")	
sc.cassandraTable("movie","movies").select("title","year").as(Movie)	
Read A Cassandra Table_
32
•Every RDD can be saved
• Using Tuples
val	tuples	=	sc.parallelize(Seq(("Hobbit",2012),("96	Hours",2008)))		
tuples.saveToCassandra("movie","movies",	SomeColumns("title","year")	
• Using Case Classes
case	class	Movie	(title:String,	year:	int)

val	objects	=		
sc.parallelize(Seq(Movie("Hobbit",2012),Movie("96	Hours",2008)))		
objects.saveToCassandra("movie","movies")	
Write Table_
33
//	Load	and	format	as	Pair	RDD	
val	pairRDD	=	sc.cassandraTable("movie","director")	
.map(r	=>	(r.getString("country"),r))	
//	Directors	/	Country,	sorted	
pairRDD.mapValues(v	=>	1).reduceByKey(_+_)	
.sortBy(-_._2).collect.foreach(println)	
//	or,	unsorted	
pairRDD.countByKey().foreach(println)	
//	All	Countries	
pairRDD.keys()	
Pair RDDs With Cassandra_
34
director
name text K
country text
•Joins can be expensive as they may require shuffling
val	directors	=	sc.cassandraTable(..)

.map(r	=>	(r.getString("name"),r))	
val	movies	=	sc.cassandraTable()

.map(r	=>	(r.getString("director"),r))	
movies.join(directors)	
//	RDD[(String,	(CassandraRow,	CassandraRow))]	
Pair RDDs With Cassandra - Join
35
director
name text K
country text
movie
title text K
director text
•Automatically on read
•Not automatically on write
• No Shuffling Spark Operations -> Writes are local
• Shuffeling Spark Operartions
• Fan Out writes to Cassandra
• repartitionByCassandraReplica(“keyspace“, “table“) before write
•Joins with data locality
Using Data Locality With Cassandra_
36
sc.cassandraTable[CassandraRow](KEYSPACE,	A)	
.repartitionByCassandraReplica(KEYSPACE,	B)	
.joinWithCassandraTable[CassandraRow](KEYSPACE,	B)

.on(SomeColumns("id"))
•cassandraCount()
• Utilizes Cassandra query
• vs load the table into memory and do a count
•spanBy(), spanByKey()
• group data by Cassandra partition key
• does not need shuffling
• should be preferred over groupBy/groupByKey
CREATE TABLE events (year int, month int, ts timestamp, data
varchar, PRIMARY KEY (year,month,ts));
sc.cassandraTable("test",	"events")	
		.spanBy(row	=>	(row.getInt("year"),	row.getInt("month")))	
sc.cassandraTable("test",	"events")	
		.keyBy(row	=>	(row.getInt("year"),	row.getInt("month")))	
		.spanByKey	
Further Transformations & Actions_
37
Spark & Cassandra Demo
38
Create an Application
39
•Normal Scala Application
•SBT as build tool
•source in src/main/scala-2.10
•assembly.sbt in root and project directory
•build.sbt in root directory
•sbt assembly to build
Scala Application_
40
libraryDependencies	+=	"com.datastax.spark"	%	"spark-cassandra-connector"	%	"1.3.0"	
libraryDependencies	+=	"org.apache.spark"	%	"spark-core"	%	"1.3.1"	%	"provided"	
libraryDependencies	+=	"org.apache.spark"	%	"spark-mllib_2.10"	%	"1.3.1"	%	"provided"	
libraryDependencies	+=	"org.apache.spark"	%	"spark-streaming_2.10"	%	"1.3.1"	%	
"provided"
•Normal Java Application
•Java 8!
•MVN as build tool
•source in src/main/java
•in pom.xml
• dependencies (spark-core, spark-streaming, spark-mllib, 

spark-cassandra-connector)
• assembly-plugin or shade-plugin
•mvn clean install to build
Java Application_
41
•Special classes for Java
SparkConf	conf	=	

new	SparkConf().setMaster("local[2]").setAppName("Java")

.set("spark.cassandra.connection.host",	"127.0.0.1");	
JavaSparkContext	sc	=	new	JavaSparkContext(conf);	
JavaStreamingContext	ssc	=	new	JavaStreamingContext(conf,	
Durations.seconds(1L));	
JavaRDD<Integer>	rdd	=		
sc.parallelize(Arrays.asList(1,	2,	3,	4,	5,	6));	
rdd.filter(e	->	e	%	2	==	0).foreach(System.out::println);	
Java Specials_
42
•Special classes for Java
import	static	
com.datastax.spark.connector.japi.CassandraJavaUtil.*;	
CassandraTableScanJavaRDD<CassandraRow>	table	=	
javaFunctions(sc.sparkContext())	
.cassandraTable("keyspace",	„table");	
CassandraTableScanJavaRDD<Entity>	table	=	
javaFunctions(sc.sparkContext())

.cassandraTable("keyspace",	"table",mapRowTo(Entity.class))	
javaFunctions(someRDD).writerBuilder("keyspace",	"table",	
mapToRow(Entity.class)).saveToCassandra();	
Java Specials - Cassandra_
43
Spark SQL
44
•SQL Queries with Spark (SQL & HiveQL)
• On structured data
• On DataFrame
• Every result of Spark SQL is a DataFrame
• All operations of the GenericRDD‘s available
•Supports (even on non primary key columns)
• Joins
• Union
• Group By
• Having
• Order By
Spark SQL_
45
val	sqlContext	=	new	SQLContext(sc)	
val	persons	=	sqlContext.jsonFile(path)	
//	Show	the	schema	
persons.printSchema()	
persons.registerTempTable("persons")	
val	adults	=		
sqlContext.sql("SELECT	name	FROM	persons	WHERE	age	>	18")	
adults.collect.foreach(println)	
Spark SQL - JSON Example_
46
{"name":"Michael"}	
{"name":"Jan",	"age":30}	
{"name":"Tim",	"age":17}
val	csc	=	new	CassandraSQLContext(sc)	
csc.setKeyspace("musicdb")	
val	result	=	csc.sql("SELECT	country,	COUNT(*)	as	anzahl"	+	
	 	 	 				"FROM	artists	GROUP	BY	country"	+	
	 	 	 				"ORDER	BY	anzahl	DESC");	
result.collect.foreach(println);	
Spark SQL - Cassandra Example_
47
Spark SQL Demo
48
Spark Streaming
49
•Real Time Processing using micro batches
•Supported sources: TCP, S3, Kafka, Twitter,..
•Data as Discretized Stream (DStream)
•Same programming model as for batches
•All Operations of the GenericRDD & SQL & MLLib
•Stateful Operations & Sliding Windows
Stream Processing With Spark Streaming_
50
import	org.apache.spark.streaming._	
val	ssc	=	new	StreamingContext(sc,Seconds(1))	
val	stream	=	ssc.socketTextStream("127.0.0.1",9999)	
stream.map(x	=>	1).reduce(_	+	_).print()	
ssc.start()	
//	await	manual	termination	or	error	
ssc.awaitTermination()	
//	manual	termination	
ssc.stop()	
Spark Streaming - Example_
51
•Maintain State for each key in a DStream: updateStateByKey
Spark Streaming - Stateful Operations_
52
def	updateAlbumCount(newValues:	Seq[Int],runningCount:	
Option[Int])	:	Option[Int]	=	
{	
		val	newCount	=	runningCount.getOrElse(0)	+	newValues.size	
		Some(newCount)	
}	
val	countStream	=	stream.updateStateByKey[Int]	
								(updateAlbumCount	_)	
Stream	is	a	DStream	of	Pair	RDD's
•One Receiver -> One Node
• Start more receivers and union them
val	numStreams	=	5	
val	kafkaStreams	=	(1	to	numStreams).map	{	i	=>	
KafkaUtils.createStream(...)	}	
val	unifiedStream	=	streamingContext.union(kafkaStreams)	
unifiedStream.print()	
•Received data will be split up into blocks
• 1 block => 1 task
• blocks = batchSize / blockInterval
•Repartition data to distribute over cluster
Spark Streaming - Parallelism_
53
Spark Streaming Demo
54
Spark MLLib
55
•Fully integrated in Spark
• Scalable
• Scala, Java & Python APIs
• Use with Spark Streaming & Spark SQL
•Packages various algorithms for machine learning
•Includes
• Clustering
• Classification
• Prediction
• Collaborative Filtering
•Still under development
• performance, algorithms
Spark MLLib_
56
MLLib Example - Clustering_
57
age
set of data points meaningful clusters
income
//	Load	and	parse	data	
val	data	=	sc.textFile("data/mllib/kmeans_data.txt")	
val	parsedData	=	data

.map(s	=>	Vectors.dense(s.split('	')	
.map(_.toDouble))).cache()	
		
//	Cluster	the	data	into	3	classes	using	KMeans	with	20	
iterations	
val	clusters	=	KMeans.train(parsedData,	2,	20)	
		
//	Evaluate	clustering	by	computing	Sum	of	Squared	Errors	
val	SSE	=	clusters.computeCost(parsedData)	
println("Sum	of	Squared	Errors	=	"	+	WSSSE)	
MLLib Example - Clustering (using KMeans)_
58
MLLib Example - Classification_
59
MLLib Example - Classification_
60
//	Load	training	data	in	LIBSVM	format.	
val	data	=		
MLUtils.loadLibSVMFile(sc,	"sample_libsvm_data.txt")	
//	Split	data	into	training	(60%)	and	test	(40%).	
val	splits	=	data.randomSplit(Array(0.6,	0.4),	seed	=	11L)	
val	training	=	splits(0).cache()	
val	test	=	splits(1)	
//	Run	training	algorithm	to	build	the	model	
val	numIterations	=	100	
val	model	=	SVMWithSGD.train(training,	numIterations)
MLLib Example - Classification (Linear SVM)_
61
//	Compute	raw	scores	on	the	test	set.	
val	scoreAndLabels	=	test.map	{	point	=>	
		val	score	=	model.predict(point.features)	
		(score,	point.label)	
}	
//	Get	evaluation	metrics.	
val	metrics	=	new	
BinaryClassificationMetrics(scoreAndLabels)	
val	auROC	=	metrics.areaUnderROC()	
println("Area	under	ROC	=	"	+	auROC)
MLLib Example - Classification (Linear SVM)_
62
MLLib Example - Collaborative Filtering_
63
//	Load	and	parse	the	data	(userid,itemid,rating)	
val	data	=	sc.textFile("data/mllib/als/test.data")	
val	ratings	=	data.map(_.split(',')	match		
{		
case	Array(user,	item,	rate)	=>	Rating(user.toInt,	
item.toInt,	rate.toDouble)		
})	
//	Build	the	recommendation	model	using	ALS	
val	rank	=	10	
val	numIterations	=	20	
val	model	=	ALS.train(ratings,	rank,	numIterations,	0.01)
MLLib Example - Collaborative Filtering using ALS_
64
//	Evaluate	the	model	on	rating	data	
val	usersProducts	=	ratings.map	{		
case	Rating(user,	product,	rate)	=>	(user,	product)	}	
val	predictions	=	model.predict(usersProducts).map	{	
	case	Rating(user,	product,	rate)	=>	((user,	product),	rate)	
}	
val	ratesAndPredictions	=	ratings.map	{		
case	Rating(user,	product,	rate)	=>((user,	product),	rate)}	
.join(predictions)	
val	MSE	=	ratesAndPredictions.map	{		
case	((user,	product),	(r1,	r2))	=>	val	err	=	(r1	-	r2);	
err	*	err	}.mean()	
println("Mean	Squared	Error	=	"	+	MSE)
MLLib Example - Collaborative Filtering using ALS_
65
Use Cases
66
•In particular for huge amounts of external data
•Support for CSV, TSV, XML, JSON und other
Use Cases for Spark and Cassandra_
67
Data Loading
case	class	User	(id:	java.util.UUID,	name:	String)	
val	users	=	sc.textFile("users.csv")
.repartition(2*sc.defaultParallelism)
.map(line	=>	line.split(",")	match	{	case	Array(id,name)	=>	
User(java.util.UUID.fromString(id),	name)})	
users.saveToCassandra("keyspace","users")
Validate consistency in a Cassandra database
•syntactic
• Uniqueness (only relevant for columns not in the PK)
• Referential integrity
• Integrity of the duplicates
•semantic
• Business- or Application constraints
• e.g.: At least one genre per movies, a maximum of 10 tags per blog
post
Use Cases for Spark and Cassandra_
68
Validation & Normalization
•Modelling, Mining, Transforming, ....
•Use Cases
• Recommendation
• Fraud Detection
• Link Analysis (Social Networks, Web)
• Advertising
• Data Stream Analytics ( Spark Streaming)
• Machine Learning ( Spark ML)
Use Cases for Spark and Cassandra_
69
Analyses (Joins, Transformations,..)
•Changes on existing tables
• New table required when changing primary key
• Otherwise changes could be performed in-place
•Creating new tables
• data derived from existing tables
• Support new queries
•Use the CassandraConnectors in Spark
Use Cases for Spark and Cassandra_
70
Schema Migration
Thank you for your attention!
71
Questions?
Matthias Niehoff,
IT-Consultant
90
codecentric AG
Zeppelinstraße 2
76185 Karlsruhe, Germany
mobil: +49 (0) 172.1702676
matthias.niehoff@codecentric.de
www.codecentric.de
blog.codecentric.de
matthiasniehoff

More Related Content

What's hot (20)

PDF
MySQL Server Backup, Restoration, And Disaster Recovery Planning Presentation
Colin Charles
 
PDF
Introduction to Apache Spark
Datio Big Data
 
PPTX
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
PDF
Spark core
Freeman Zhang
 
PPTX
RocksDB compaction
MIJIN AN
 
PDF
Cost-Based Optimizer in Apache Spark 2.2
Databricks
 
PDF
Grokking #9: Building a real-time and offline editing service with Couchbase
Oliver N
 
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
PDF
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
Databricks
 
PDF
MongoDB Administration 101
MongoDB
 
PDF
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
PDF
Cassandra Introduction & Features
DataStax Academy
 
PDF
Map reduce vs spark
Tudor Lapusan
 
PPTX
ElasticSearch Basic Introduction
Mayur Rathod
 
PPTX
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
Julyanto SUTANDANG
 
PDF
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
PDF
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
PPTX
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
PPTX
Cql – cassandra query language
Courtney Robinson
 
PPTX
Introduction to Redis
Maarten Smeets
 
MySQL Server Backup, Restoration, And Disaster Recovery Planning Presentation
Colin Charles
 
Introduction to Apache Spark
Datio Big Data
 
Hive Tutorial | Hive Architecture | Hive Tutorial For Beginners | Hive In Had...
Simplilearn
 
Spark core
Freeman Zhang
 
RocksDB compaction
MIJIN AN
 
Cost-Based Optimizer in Apache Spark 2.2
Databricks
 
Grokking #9: Building a real-time and offline editing service with Couchbase
Oliver N
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
Databricks
 
MongoDB Administration 101
MongoDB
 
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
Cassandra Introduction & Features
DataStax Academy
 
Map reduce vs spark
Tudor Lapusan
 
ElasticSearch Basic Introduction
Mayur Rathod
 
High Availability and Disaster Recovery in PostgreSQL - EQUNIX
Julyanto SUTANDANG
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
How to understand and analyze Apache Hive query execution plan for performanc...
DataWorks Summit/Hadoop Summit
 
Cql – cassandra query language
Courtney Robinson
 
Introduction to Redis
Maarten Smeets
 

Viewers also liked (13)

PPTX
Spark Cassandra Connector: Past, Present and Furure
DataStax Academy
 
PPTX
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Matthias Niehoff
 
PDF
Streaming Plattformen und die Qual der Wahl
Matthias Niehoff
 
PDF
Advanced Operations
DataStax Academy
 
PDF
Feeding Cassandra with Spark-Streaming and Kafka
DataStax Academy
 
PDF
Cassandra & Spark for IoT
Matthias Niehoff
 
PDF
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 
PDF
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
PDF
Real-time Analytics with Cassandra, Spark, and Shark
Evan Chan
 
PPTX
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...
DataStax
 
PPTX
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
PDF
When NOT to use MongoDB
Mike Michaud
 
PDF
Cassandra and Spark: Optimizing for Data Locality
Russell Spitzer
 
Spark Cassandra Connector: Past, Present and Furure
DataStax Academy
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Matthias Niehoff
 
Streaming Plattformen und die Qual der Wahl
Matthias Niehoff
 
Advanced Operations
DataStax Academy
 
Feeding Cassandra with Spark-Streaming and Kafka
DataStax Academy
 
Cassandra & Spark for IoT
Matthias Niehoff
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 
Lightning fast analytics with Spark and Cassandra
nickmbailey
 
Real-time Analytics with Cassandra, Spark, and Shark
Evan Chan
 
Real-time Data Integration with Kafka and Cassandra (Ewen Cheslack-Postava, C...
DataStax
 
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
 
When NOT to use MongoDB
Mike Michaud
 
Cassandra and Spark: Optimizing for Data Locality
Russell Spitzer
 
Ad

Similar to Big data analytics with Spark & Cassandra (20)

PDF
Analytics with Cassandra & Spark
Matthias Niehoff
 
PDF
3 Dundee-Spark Overview for C* developers
Christopher Batey
 
PDF
Lightning fast analytics with Spark and Cassandra
Rustam Aliyev
 
PPTX
Lightning Fast Analytics with Cassandra and Spark
Tim Vincent
 
PDF
Spark and cassandra (Hulu Talk)
Jon Haddad
 
PPTX
5 Ways to Use Spark to Enrich your Cassandra Environment
Jim Hatcher
 
PDF
Munich March 2015 - Cassandra + Spark Overview
Christopher Batey
 
PDF
Manchester Hadoop Meetup: Spark Cassandra Integration
Christopher Batey
 
PDF
Spark & Cassandra - DevFest Córdoba
Jose Mº Muñoz
 
PDF
Reading Cassandra Meetup Feb 2015: Apache Spark
Christopher Batey
 
PDF
Enter the Snake Pit for Fast and Easy Spark
Jon Haddad
 
PDF
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Data Con LA
 
PDF
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
PDF
Real-Time Analytics with Apache Cassandra and Apache Spark,
Swiss Data Forum Swiss Data Forum
 
PDF
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Evan Chan
 
PDF
Spark cassandra integration, theory and practice
Duyhai Doan
 
PDF
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
DataStax Academy
 
PDF
Fast track to getting started with DSE Max @ ING
Duyhai Doan
 
PPTX
Lightning fast analytics with Cassandra and Spark
Victor Coustenoble
 
PDF
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
DataStax Academy
 
Analytics with Cassandra & Spark
Matthias Niehoff
 
3 Dundee-Spark Overview for C* developers
Christopher Batey
 
Lightning fast analytics with Spark and Cassandra
Rustam Aliyev
 
Lightning Fast Analytics with Cassandra and Spark
Tim Vincent
 
Spark and cassandra (Hulu Talk)
Jon Haddad
 
5 Ways to Use Spark to Enrich your Cassandra Environment
Jim Hatcher
 
Munich March 2015 - Cassandra + Spark Overview
Christopher Batey
 
Manchester Hadoop Meetup: Spark Cassandra Integration
Christopher Batey
 
Spark & Cassandra - DevFest Córdoba
Jose Mº Muñoz
 
Reading Cassandra Meetup Feb 2015: Apache Spark
Christopher Batey
 
Enter the Snake Pit for Fast and Easy Spark
Jon Haddad
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Data Con LA
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Guido Schmutz
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Swiss Data Forum Swiss Data Forum
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Evan Chan
 
Spark cassandra integration, theory and practice
Duyhai Doan
 
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
DataStax Academy
 
Fast track to getting started with DSE Max @ ING
Duyhai Doan
 
Lightning fast analytics with Cassandra and Spark
Victor Coustenoble
 
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
DataStax Academy
 
Ad

Recently uploaded (20)

PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
PPTX
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PDF
SQL for Accountants and Finance Managers
ysmaelreyes
 
PDF
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
What Is Data Integration and Transformation?
subhashenia
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
1750162332_Snapshot-of-Indias-oil-Gas-data-May-2025.pdf
sandeep718278
 
在线购买英国本科毕业证苏格兰皇家音乐学院水印成绩单RSAMD学费发票
Taqyea
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
SQL for Accountants and Finance Managers
ysmaelreyes
 
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 

Big data analytics with Spark & Cassandra