SlideShare a Scribd company logo
1
Cassandra 2.2 and 3.0
new features
DuyHai DOAN
Apache Cassandra Technical Evangelist
#VoxxedBerlin @doanduyhai
Datastax
2
•  Founded in April 2010
•  We contribute a lot to Apache Cassandra™
•  400+ customers (25 of the Fortune 100), 450+ employees
•  Headquarter in San Francisco Bay area
•  EU headquarter in London, offices in France and Germany
•  Datastax Enterprise = OSS Cassandra + extra features
Materialized Views (MV)
•  Why ?
•  Detailed Impl
•  Gotchas
Why Materialized Views ?
•  Relieve the pain of manual denormalization
CREATE TABLE user(
id int PRIMARY KEY,
country text,
…
);
CREATE TABLE user_by_country(
country text,
id int,
…,
PRIMARY KEY(country, id)
);
4
CREATE TABLE user_by_country (
country text,
id int,
firstname text,
lastname text,
PRIMARY KEY(country, id));
Materialzed View In Action
CREATE MATERIALIZED VIEW user_by_country
AS SELECT country, id, firstname, lastname
FROM user
WHERE country IS NOT NULL AND id IS NOT NULL
PRIMARY KEY(country, id)
5
Materialzed View Syntax
CREATE MATERIALIZED VIEW [IF NOT EXISTS]
keyspace_name.view_name
AS SELECT column1, column2, ...
FROM keyspace_name.table_name
WHERE column1 IS NOT NULL AND column2 IS NOT NULL ...
PRIMARY KEY(column1, column2, ...)
Must select all primary key columns of base table
•  IS NOT NULL condition for now
•  more complex conditions in future
•  at least all primary key columns of base table
(ordering can be different)
•  maximum 1 column NOT pk from base table
6
Materialized Views Demo
7
Materialized View Impl
C*
C*
C*
C*
C* C*
C* C*
UPDATE user
SET country=‘FR’
WHERE id=1
①
•  send mutation to all replicas
•  waiting for ack(s) with CL
8
Materialized View Impl
C*
C*
C*
C*
C* C*
C* C*
UPDATE user
SET country=‘FR’
WHERE id=1
②
Acquire local lock on
base table partition
9
Materialized View Impl
C*
C*
C*
C*
C* C*
C* C*
UPDATE user
SET country=‘FR’
WHERE id=1
③
Local read to fetch current values
SELECT * FROM user WHERE id=1
10
Materialized View Impl
C*
C*
C*
C*
C* C*
C* C*
UPDATE user
SET country=‘FR’
WHERE id=1
④
Create BatchLog with
•  DELETE FROM user_by_country
WHERE country = ‘old_value’
•  INSERT INTO
user_by_country(country, id, …)
VALUES(‘FR’, 1, ...)
11
Materialized View Impl
C*
C*
C*
C*
C* C*
C* C*
UPDATE user
SET country=‘FR’
WHERE id=1
⑤
Execute async BatchLog
to paired view replica
with CL = ONE
12
Materialized View Impl
C*
C*
C*
C*
C* C*
C* C*
UPDATE user
SET country=‘FR’
WHERE id=1
⑥
Apply base table updade locally
SET COUNTRY=‘FR’
13
Materialized View Impl
C*
C*
C*
C*
C* C*
C* C*
UPDATE user
SET country=‘FR’
WHERE id=1
⑦
Release local lock
14
Materialized View Impl
C*
C*
C*
C*
C* C*
C* C*
UPDATE user
SET country=‘FR’
WHERE id=1
⑧
Return ack to
coordinator
15
Materialized View Impl
C*
C*
C*
C*
C* C*
C* C*
UPDATE user
SET country=‘FR’
WHERE id=1
⑨
If CL ack(s)
received, ack client
16
MV Failure Cases: concurrent updates
Read base row (country=‘UK’)
•  DELETE FROM mv WHERE
country=‘UK’
•  INSERT INTO mv …(country)
VALUES(‘US’)
•  Send async BatchLog
•  Apply update country=‘US’
1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’
Read base row (country=‘UK’)
•  DELETE FROM mv WHERE
country=‘UK’
•  INSERT INTO mv …(country)
VALUES(‘FR’)
•  Send async BatchLog
•  Apply update country=‘FR’
t0
t1
t2
Without local lock
17
MV Failure Cases: concurrent updates
Read base row (country=‘UK’)
•  DELETE FROM mv WHERE
country=‘UK’
•  INSERT INTO mv …(country)
VALUES(‘US’)
•  Send async BatchLog
•  Apply update country=‘US’
1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’
Read base row (country=‘UK’)
•  DELETE FROM mv WHERE
country=‘UK’
•  INSERT INTO mv …(country)
VALUES(‘FR’)
•  Send async BatchLog
•  Apply update country=‘FR’
t0
t1
t2
Without local lock
18
INSERT INTO mv …(country) VALUES(‘US’)
INSERT INTO mv …(country) VALUES(‘FR’)
MV Failure Cases: concurrent updates
Read base row (country=‘UK’)
•  DELETE FROM mv WHERE
country=‘UK’
•  INSERT INTO mv …(country)
VALUES(‘US’)
•  Send async BatchLog
•  Apply update country=‘US’
1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’
Read base row (country=‘US’)
•  DELETE FROM mv WHERE
country=‘US’
•  INSERT INTO mv …(country)
VALUES(‘FR’)
•  Send async BatchLog
•  Apply update country=‘FR’
With local lock
🔒
🔓 🔒
🔓19
MV Failure Cases: failed updates to MV
C*
C*
C*
C*
C* C*
C* C*
UPDATE user
SET country=‘FR’
WHERE id=1
⑤
Execute async BatchLog
to paired view replica
with CL = ONE
✘
MV replica down
20
MV Failure Cases: failed updates to MV
C*
C*
C*
C*
C* C*
C* C*
UPDATE user
SET country=‘FR’
WHERE id=1
BatchLog
replay
MV replica up
21
Materialized View Performance
•  Write performance
•  local lock
•  local read-before-write for MV à update contention on partition (most of perf hits)
•  local batchlog for MV
•  ☞ you only pay this price once whatever number of MV
•  for each base table update: mv_count x 2 (DELETE + INSERT) extra mutations
22
Materialized View Performance
•  Write performance vs manual denormalization
•  MV better because no client-server network traffic for read-before-write
•  MV better because less network traffic for multiple views (client-side BATCH)
•  Makes developer life easier à priceless
23
Materialized View Performance
•  Read performance vs secondary index
•  MV better because single node read (secondary index can hit many nodes)
•  MV better because single read path (secondary index = read index + read data)
24
Materialized Views Consistency
•  Consistency level
•  CL honoured for base table, ONE for MV + local batchlog
•  Weaker consistency guarantees for MV than for base table.
•  Exemple, write at QUORUM
•  guarantee that QUORUM replicas of base tables have received write
•  guarantee that QUORUM of MV replicas will eventually receive DELETE + INSERT
25
Materialized Views Gotchas
•  Beware of hot spots !!!
•  MV user_by_gender 😱
26
Q & A
! "
27
User Define Functions (UDF)
•  Why ?
•  Detailed Impl
•  UDAs
•  Gotchas
Rationale
•  Push computation server-side
•  save network bandwidth (1000 nodes!)
•  simplify client-side code
•  provide standard & useful function (sum, avg …)
•  accelerate analytics use-case (pre-aggregation for Spark)
29
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALL ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURN returnType
LANGUAGE language
AS $$
// source code here
$$;
30
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURN returnType
LANGUAGE language
AS $$
// source code here
$$;
An UDF is keyspace-wide
31
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURN returnType
LANGUAGE language
AS $$
// source code here
$$;
Param name to refer to in the code
Type = CQL3 type
32
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURN returnType
LANGUAGE language // j
AS $$
// source code here
$$;
Always called
Null-check mandatory in code
33
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURN returnType
LANGUAGE language // jav
AS $$
// source code here
$$;
If any input is null, code block is
skipped and return null
34
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURN returnType
LANGUAGE language
AS $$
// source code here
$$;
CQL types
•  primitives (boolean, int, …)
•  collections (list, set, map)
•  tuples
•  UDT
35
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURN returnType
LANGUAGE language
AS $$
// source code here
$$; JVM supported languages
•  Java, Scala
•  Javascript (slow)
•  Groovy, Jython, JRuby
•  Clojure ( JSR 223 impl issue)
36
How to create an UDF ?
CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS]
[keyspace.]functionName (param1 type1, param2 type2, …)
CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT
RETURN returnType
LANGUAGE language
AS $$
// source code here
$$;
37
UDF Demo
38
UDA
•  Real use-case for UDF
•  Aggregation server-side à huge network bandwidth saving
•  Provide similar behavior for Group By, Sum, Avg etc …
39
How to create an UDA ?
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
[keyspace.]aggregateName(type1, type2, …)
SFUNC accumulatorFunction
STYPE stateType
[FINALFUNC finalFunction]
INITCOND initCond;
Only type, no param name
State type
Initial state type
40
How to create an UDA ?
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
[keyspace.]aggregateName(type1, type2, …)
SFUNC accumulatorFunction
STYPE stateType
[FINALFUNC finalFunction]
INITCOND initCond;
Accumulator function. Signature:
accumulatorFunction(stateType, type1, type2, …)
RETURNS stateType
41
How to create an UDA ?
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
[keyspace.]aggregateName(type1, type2, …)
SFUNC accumulatorFunction
STYPE stateType
[FINALFUNC finalFunction]
INITCOND initCond;
Optional final function. Signature:
finalFunction(stateType)
42
How to create an UDA ?
CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS]
[keyspace.]aggregateName(type1, type2, …)
SFUNC accumulatorFunction
STYPE stateType
[FINALFUNC finalFunction]
INITCOND initCond;
UDA return type ?
If finalFunction
•  return type of finalFunction
Else
•  return stateType
43
UDA Demo
44
Gotchas
C* C*
C*
C*
UDA
①
② & ③
⑤
② & ③
② & ③
45
Gotchas
C* C*
C*
C*
UDA
①
② & ③
⑤
② & ③
② & ③
46
Why do not apply UDF/UDA on replica node ?
Gotchas
C* C*
C*
C*
UDA
①
② & ③
④
•  apply accumulatorFunction
•  apply finalFunction
⑤
② & ③
② & ③
1.  Because of eventual
consistency
2.  UDF/UDA applied AFTER
last-write-win logic
47
Gotchas
48
•  UDA in Cassandra is not distributed !
•  Execute UDA on a large number of rows (106 for ex.)
•  single fat partition
•  multiple partitions
•  full table scan
•  à Increase client-side timeout
•  default Java driver timeout = 12 secs
•  JAVA-1033 JIRA for per-request timeout setting
Cassandra UDA or Apache Spark ?
49
Consistency
Level
Single/Multiple
Partition(s)
Recommended
Approach
ONE Single partition UDA with token-aware driver because node local
ONE Multiple partitions Apache Spark because distributed reads
> ONE Single partition UDA because data-locality lost with Spark
> ONE Multiple partitions Apache Spark definitely
Cassandra UDA or Apache Spark ?
50
Consistency
Level
Single/Multiple
Partition(s)
Recommended
Approach
ONE Single partition UDA with token-aware driver because node local
ONE Multiple partitions Apache Spark because distributed reads
> ONE Single partition UDA because data-locality lost with Spark
> ONE Multiple partitions Apache Spark definitely
Q & A
! "
51
New Storage Engine
•  Data structure
•  Disk space usage
Pre 3.0 data structure
Map<byte[ ], SortedMap<byte[ ], Cell>>
53
CREATE TABLE sensor_data(
sensor_id uuid,
date timestamp,
sensor_type text,
sensor_value double,
PRIMARY KEY(sensor_id, date)
);
Pre 3.0 on disk layout
54
RowKey: de305d54-75b4-431b-adb2-eb6b9e546014
=> (column=2015-04-27 10:00:00+0100:, value=, timestamp=1430128800)
=> (column=2015-04-27 10:00:00+0100:sensor_type, value=‘Temperature’, timestamp=1430128800)
=> (column=2015-04-27 10:00:00+0100:sensor_value, value=23.48, timestamp=1430128800)
=> (column=2015-04-27 10:01:00+0100:, value=, timestamp=1430128860)
=> (column=2015-04-27 10:01:00+0100:sensor_type, value=‘Temperature’, timestamp=1430128860)
=> (column=2015-04-27 10:01:00+0100:sensor_value, value=24.08, timestamp=1430128860)
Clustering values are repeated
for each normal column
Full timestamp storage
Cassandra 3.0 data structure
Map<byte[ ], SortedMap<ClusteringColumn, Row>>
55
CREATE TABLE sensor_data(
sensor_id uuid,
date timestamp,
sensor_type text,
sensor_value double,
PRIMARY KEY(sensor_id, date)
);
Cassandra 3.0 on disk layout
56
PartitionKey: de305d54-75b4-431b-adb2-eb6b9e546014
=> clusteringColumn:2015-04-27 10:00:00+0100
=> row_timestamp=1430128800
=> (column_value=‘Temperature’, delta_encoded_timestamp=+0)
=> (column_value=23.48, delta_encoded_timestamp=+0)
=> clusteringColumn:2015-04-27 10:01:00+0100
=> row_timestamp=1430128860
=> (column_value=‘Temperature’, delta_encoded_timestamp=+0)
=> (column_value=24.08, delta_encoded_timestamp=+0)
Delta-encoded timestamp
vs row timestamp
Gains
57
•  No clustering value repetition
•  Column labels are stored only once in meta data
•  Delta encoding of timestamp, 8 bytes saved each time
•  Less disk space used
Benchmarks
58
CREATE TABLE events (
id uuid,
date timeuuid,
prop1 int,
prop2 text,
prop3 float,
PRIMARY KEY(id, date));
106 rows
Small string
Benchmarks
59
CREATE TABLE largetext(
key int,
prop1 int,
prop2 text,
PRIMARY KEY(id));
106 rows
Large string (1000)
Benchmarks
60
CREATE TABLE
largeclustering(
key int,
clust text,
prop1 int,
prop2 set<float>,
PRIMARY KEY(id, clust));
106 rowsMedium string (100)
50 items
Benchmarks
61
CREATE TABLE events (
id uuid,
date timeuuid,
prop1 int,
prop2 text,
prop3 float,
PRIMARY KEY(id, date))
WITH COMPACT STORAGE ;
Q & A
! "
62
@doanduyhai
duy_hai.doan@datastax.com
https://academy.datastax.com/
Thank You
63

More Related Content

PDF
Cassandra UDF and Materialized Views
Duyhai Doan
 
PDF
User defined-functions-cassandra-summit-eu-2014
Robert Stupp
 
PDF
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
jkni
 
PDF
Data stax academy
Duyhai Doan
 
PDF
Http4s, Doobie and Circe: The Functional Web Stack
GaryCoady
 
PDF
Node Boot Camp
Troy Miles
 
PDF
Cassandra 3.0 Awesomeness
Jon Haddad
 
KEY
The Why and How of Scala at Twitter
Alex Payne
 
Cassandra UDF and Materialized Views
Duyhai Doan
 
User defined-functions-cassandra-summit-eu-2014
Robert Stupp
 
Testing Cassandra Guarantees under Diverse Failure Modes with Jepsen
jkni
 
Data stax academy
Duyhai Doan
 
Http4s, Doobie and Circe: The Functional Web Stack
GaryCoady
 
Node Boot Camp
Troy Miles
 
Cassandra 3.0 Awesomeness
Jon Haddad
 
The Why and How of Scala at Twitter
Alex Payne
 

What's hot (20)

PDF
XQuery in the Cloud
William Candillon
 
PDF
Indexing in Cassandra
Ed Anuff
 
PDF
Scala @ TechMeetup Edinburgh
Stuart Roebuck
 
PDF
Terraform introduction
Jason Vance
 
PDF
Percona toolkit
Karwin Software Solutions LLC
 
PDF
Mentor Your Indexes
Karwin Software Solutions LLC
 
PDF
Not your Grandma's XQuery
William Candillon
 
PDF
XQuery Rocks
William Candillon
 
PDF
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Lucidworks
 
PDF
Scala active record
鉄平 土佐
 
PDF
Custom deployments with sbt-native-packager
GaryCoady
 
PDF
Scala coated JVM
Stuart Roebuck
 
PDF
Solr Indexing and Analysis Tricks
Erik Hatcher
 
PDF
Spark workshop
Wojciech Pituła
 
ODP
Aura Project for PHP
Hari K T
 
PPTX
A Brief Intro to Scala
Tim Underwood
 
PDF
Webエンジニアから見たiOS5
Satoshi Asano
 
PDF
Introductory Overview to Managing AWS with Terraform
Michael Heyns
 
PDF
Benchx: An XQuery benchmarking web application
Andy Bunce
 
PDF
Lucene for Solr Developers
Erik Hatcher
 
XQuery in the Cloud
William Candillon
 
Indexing in Cassandra
Ed Anuff
 
Scala @ TechMeetup Edinburgh
Stuart Roebuck
 
Terraform introduction
Jason Vance
 
Mentor Your Indexes
Karwin Software Solutions LLC
 
Not your Grandma's XQuery
William Candillon
 
XQuery Rocks
William Candillon
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Lucidworks
 
Scala active record
鉄平 土佐
 
Custom deployments with sbt-native-packager
GaryCoady
 
Scala coated JVM
Stuart Roebuck
 
Solr Indexing and Analysis Tricks
Erik Hatcher
 
Spark workshop
Wojciech Pituła
 
Aura Project for PHP
Hari K T
 
A Brief Intro to Scala
Tim Underwood
 
Webエンジニアから見たiOS5
Satoshi Asano
 
Introductory Overview to Managing AWS with Terraform
Michael Heyns
 
Benchx: An XQuery benchmarking web application
Andy Bunce
 
Lucene for Solr Developers
Erik Hatcher
 
Ad

Viewers also liked (20)

PDF
Cassandra Materialized Views
Carl Yeksigian
 
PDF
Spring 4.3-component-design
Grzegorz Duda
 
PDF
Paolucci voxxed-days-berlin-2016-age-of-orchestration
Grzegorz Duda
 
PDF
Voxxed berlin2016profilers|
Grzegorz Duda
 
PDF
Docker orchestration voxxed days berlin 2016
Grzegorz Duda
 
PDF
The internet of (lego) trains
Grzegorz Duda
 
PDF
Advanced akka features
Grzegorz Duda
 
PDF
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
DataStax
 
PDF
OrientDB - Voxxed Days Berlin 2016
Luigi Dell'Aquila
 
PDF
Size does matter - How to cut (micro-)services correctly
OPEN KNOWLEDGE GmbH
 
PDF
Advanced search and Top-K queries in Cassandra
Stratio
 
PPT
05 OLAP v6 weekend
Prithwis Mukerjee
 
PPTX
FedX - Optimization Techniques for Federated Query Processing on Linked Data
aschwarte
 
PPT
Whats A Data Warehouse
None None
 
PDF
Data Warehouse and OLAP - Lear-Fabini
Scott Fabini
 
PPTX
Oracle Optimizer: 12c New Capabilities
Guatemala User Group
 
PPT
Benchmarking graph databases on the problem of community detection
Symeon Papadopoulos
 
PDF
Materialized views in PostgreSQL
Ashutosh Bapat
 
PPTX
SSSW2015 Data Workflow Tutorial
SSSW
 
PDF
Olap Cube Design
h1m
 
Cassandra Materialized Views
Carl Yeksigian
 
Spring 4.3-component-design
Grzegorz Duda
 
Paolucci voxxed-days-berlin-2016-age-of-orchestration
Grzegorz Duda
 
Voxxed berlin2016profilers|
Grzegorz Duda
 
Docker orchestration voxxed days berlin 2016
Grzegorz Duda
 
The internet of (lego) trains
Grzegorz Duda
 
Advanced akka features
Grzegorz Duda
 
Light Weight Transactions Under Stress (Christopher Batey, The Last Pickle) ...
DataStax
 
OrientDB - Voxxed Days Berlin 2016
Luigi Dell'Aquila
 
Size does matter - How to cut (micro-)services correctly
OPEN KNOWLEDGE GmbH
 
Advanced search and Top-K queries in Cassandra
Stratio
 
05 OLAP v6 weekend
Prithwis Mukerjee
 
FedX - Optimization Techniques for Federated Query Processing on Linked Data
aschwarte
 
Whats A Data Warehouse
None None
 
Data Warehouse and OLAP - Lear-Fabini
Scott Fabini
 
Oracle Optimizer: 12c New Capabilities
Guatemala User Group
 
Benchmarking graph databases on the problem of community detection
Symeon Papadopoulos
 
Materialized views in PostgreSQL
Ashutosh Bapat
 
SSSW2015 Data Workflow Tutorial
SSSW
 
Olap Cube Design
h1m
 
Ad

Similar to Cassandra and materialized views (20)

PDF
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz
 
PPTX
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Windows Developer
 
PDF
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
Sang Don Kim
 
PDF
Cassandra 3 new features 2016
Duyhai Doan
 
PDF
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
InfluxData
 
PDF
How and Where in GLORP
ESUG
 
PDF
Implementing New Web
Julian Viereck
 
PDF
Implementing new WebAPIs
Julian Viereck
 
PDF
Building DSLs On CLR and DLR (Microsoft.NET)
Vitaly Baum
 
PDF
Getting Started with PL/Proxy
Peter Eisentraut
 
PDF
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Romain Dorgueil
 
PDF
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Pôle Systematic Paris-Region
 
PDF
Practical pig
trihug
 
PDF
Declarative Infrastructure Tools
Yulia Shcherbachova
 
PDF
Integration-Monday-Stateful-Programming-Models-Serverless-Functions
BizTalk360
 
PDF
Bye bye $GLOBALS['TYPO3_DB']
Jan Helke
 
PDF
Booting into functional programming
Dhaval Dalal
 
PPTX
What’s new in .NET
Doommaker
 
PDF
Performance measurement and tuning
AOE
 
PDF
Dependencies Managers in C/C++. Using stdcpp 2014
biicode
 
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Windows Developer
 
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
Sang Don Kim
 
Cassandra 3 new features 2016
Duyhai Doan
 
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
InfluxData
 
How and Where in GLORP
ESUG
 
Implementing New Web
Julian Viereck
 
Implementing new WebAPIs
Julian Viereck
 
Building DSLs On CLR and DLR (Microsoft.NET)
Vitaly Baum
 
Getting Started with PL/Proxy
Peter Eisentraut
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Romain Dorgueil
 
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Pôle Systematic Paris-Region
 
Practical pig
trihug
 
Declarative Infrastructure Tools
Yulia Shcherbachova
 
Integration-Monday-Stateful-Programming-Models-Serverless-Functions
BizTalk360
 
Bye bye $GLOBALS['TYPO3_DB']
Jan Helke
 
Booting into functional programming
Dhaval Dalal
 
What’s new in .NET
Doommaker
 
Performance measurement and tuning
AOE
 
Dependencies Managers in C/C++. Using stdcpp 2014
biicode
 

Recently uploaded (20)

PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PPTX
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
Immersive experiences: what Pharo users do!
ESUG
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
vAdobe Premiere Pro 2025 (v25.2.3.004) Crack Pre-Activated Latest
imang66g
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
Immersive experiences: what Pharo users do!
ESUG
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 

Cassandra and materialized views

  • 1. 1 Cassandra 2.2 and 3.0 new features DuyHai DOAN Apache Cassandra Technical Evangelist #VoxxedBerlin @doanduyhai
  • 2. Datastax 2 •  Founded in April 2010 •  We contribute a lot to Apache Cassandra™ •  400+ customers (25 of the Fortune 100), 450+ employees •  Headquarter in San Francisco Bay area •  EU headquarter in London, offices in France and Germany •  Datastax Enterprise = OSS Cassandra + extra features
  • 3. Materialized Views (MV) •  Why ? •  Detailed Impl •  Gotchas
  • 4. Why Materialized Views ? •  Relieve the pain of manual denormalization CREATE TABLE user( id int PRIMARY KEY, country text, … ); CREATE TABLE user_by_country( country text, id int, …, PRIMARY KEY(country, id) ); 4
  • 5. CREATE TABLE user_by_country ( country text, id int, firstname text, lastname text, PRIMARY KEY(country, id)); Materialzed View In Action CREATE MATERIALIZED VIEW user_by_country AS SELECT country, id, firstname, lastname FROM user WHERE country IS NOT NULL AND id IS NOT NULL PRIMARY KEY(country, id) 5
  • 6. Materialzed View Syntax CREATE MATERIALIZED VIEW [IF NOT EXISTS] keyspace_name.view_name AS SELECT column1, column2, ... FROM keyspace_name.table_name WHERE column1 IS NOT NULL AND column2 IS NOT NULL ... PRIMARY KEY(column1, column2, ...) Must select all primary key columns of base table •  IS NOT NULL condition for now •  more complex conditions in future •  at least all primary key columns of base table (ordering can be different) •  maximum 1 column NOT pk from base table 6
  • 8. Materialized View Impl C* C* C* C* C* C* C* C* UPDATE user SET country=‘FR’ WHERE id=1 ① •  send mutation to all replicas •  waiting for ack(s) with CL 8
  • 9. Materialized View Impl C* C* C* C* C* C* C* C* UPDATE user SET country=‘FR’ WHERE id=1 ② Acquire local lock on base table partition 9
  • 10. Materialized View Impl C* C* C* C* C* C* C* C* UPDATE user SET country=‘FR’ WHERE id=1 ③ Local read to fetch current values SELECT * FROM user WHERE id=1 10
  • 11. Materialized View Impl C* C* C* C* C* C* C* C* UPDATE user SET country=‘FR’ WHERE id=1 ④ Create BatchLog with •  DELETE FROM user_by_country WHERE country = ‘old_value’ •  INSERT INTO user_by_country(country, id, …) VALUES(‘FR’, 1, ...) 11
  • 12. Materialized View Impl C* C* C* C* C* C* C* C* UPDATE user SET country=‘FR’ WHERE id=1 ⑤ Execute async BatchLog to paired view replica with CL = ONE 12
  • 13. Materialized View Impl C* C* C* C* C* C* C* C* UPDATE user SET country=‘FR’ WHERE id=1 ⑥ Apply base table updade locally SET COUNTRY=‘FR’ 13
  • 14. Materialized View Impl C* C* C* C* C* C* C* C* UPDATE user SET country=‘FR’ WHERE id=1 ⑦ Release local lock 14
  • 15. Materialized View Impl C* C* C* C* C* C* C* C* UPDATE user SET country=‘FR’ WHERE id=1 ⑧ Return ack to coordinator 15
  • 16. Materialized View Impl C* C* C* C* C* C* C* C* UPDATE user SET country=‘FR’ WHERE id=1 ⑨ If CL ack(s) received, ack client 16
  • 17. MV Failure Cases: concurrent updates Read base row (country=‘UK’) •  DELETE FROM mv WHERE country=‘UK’ •  INSERT INTO mv …(country) VALUES(‘US’) •  Send async BatchLog •  Apply update country=‘US’ 1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’ Read base row (country=‘UK’) •  DELETE FROM mv WHERE country=‘UK’ •  INSERT INTO mv …(country) VALUES(‘FR’) •  Send async BatchLog •  Apply update country=‘FR’ t0 t1 t2 Without local lock 17
  • 18. MV Failure Cases: concurrent updates Read base row (country=‘UK’) •  DELETE FROM mv WHERE country=‘UK’ •  INSERT INTO mv …(country) VALUES(‘US’) •  Send async BatchLog •  Apply update country=‘US’ 1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’ Read base row (country=‘UK’) •  DELETE FROM mv WHERE country=‘UK’ •  INSERT INTO mv …(country) VALUES(‘FR’) •  Send async BatchLog •  Apply update country=‘FR’ t0 t1 t2 Without local lock 18 INSERT INTO mv …(country) VALUES(‘US’) INSERT INTO mv …(country) VALUES(‘FR’)
  • 19. MV Failure Cases: concurrent updates Read base row (country=‘UK’) •  DELETE FROM mv WHERE country=‘UK’ •  INSERT INTO mv …(country) VALUES(‘US’) •  Send async BatchLog •  Apply update country=‘US’ 1) UPDATE … SET country=‘US’ 2) UPDATE … SET country=‘FR’ Read base row (country=‘US’) •  DELETE FROM mv WHERE country=‘US’ •  INSERT INTO mv …(country) VALUES(‘FR’) •  Send async BatchLog •  Apply update country=‘FR’ With local lock 🔒 🔓 🔒 🔓19
  • 20. MV Failure Cases: failed updates to MV C* C* C* C* C* C* C* C* UPDATE user SET country=‘FR’ WHERE id=1 ⑤ Execute async BatchLog to paired view replica with CL = ONE ✘ MV replica down 20
  • 21. MV Failure Cases: failed updates to MV C* C* C* C* C* C* C* C* UPDATE user SET country=‘FR’ WHERE id=1 BatchLog replay MV replica up 21
  • 22. Materialized View Performance •  Write performance •  local lock •  local read-before-write for MV à update contention on partition (most of perf hits) •  local batchlog for MV •  ☞ you only pay this price once whatever number of MV •  for each base table update: mv_count x 2 (DELETE + INSERT) extra mutations 22
  • 23. Materialized View Performance •  Write performance vs manual denormalization •  MV better because no client-server network traffic for read-before-write •  MV better because less network traffic for multiple views (client-side BATCH) •  Makes developer life easier à priceless 23
  • 24. Materialized View Performance •  Read performance vs secondary index •  MV better because single node read (secondary index can hit many nodes) •  MV better because single read path (secondary index = read index + read data) 24
  • 25. Materialized Views Consistency •  Consistency level •  CL honoured for base table, ONE for MV + local batchlog •  Weaker consistency guarantees for MV than for base table. •  Exemple, write at QUORUM •  guarantee that QUORUM replicas of base tables have received write •  guarantee that QUORUM of MV replicas will eventually receive DELETE + INSERT 25
  • 26. Materialized Views Gotchas •  Beware of hot spots !!! •  MV user_by_gender 😱 26
  • 27. Q & A ! " 27
  • 28. User Define Functions (UDF) •  Why ? •  Detailed Impl •  UDAs •  Gotchas
  • 29. Rationale •  Push computation server-side •  save network bandwidth (1000 nodes!) •  simplify client-side code •  provide standard & useful function (sum, avg …) •  accelerate analytics use-case (pre-aggregation for Spark) 29
  • 30. How to create an UDF ? CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALL ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN returnType LANGUAGE language AS $$ // source code here $$; 30
  • 31. How to create an UDF ? CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN returnType LANGUAGE language AS $$ // source code here $$; An UDF is keyspace-wide 31
  • 32. How to create an UDF ? CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN returnType LANGUAGE language AS $$ // source code here $$; Param name to refer to in the code Type = CQL3 type 32
  • 33. How to create an UDF ? CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN returnType LANGUAGE language // j AS $$ // source code here $$; Always called Null-check mandatory in code 33
  • 34. How to create an UDF ? CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN returnType LANGUAGE language // jav AS $$ // source code here $$; If any input is null, code block is skipped and return null 34
  • 35. How to create an UDF ? CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN returnType LANGUAGE language AS $$ // source code here $$; CQL types •  primitives (boolean, int, …) •  collections (list, set, map) •  tuples •  UDT 35
  • 36. How to create an UDF ? CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN returnType LANGUAGE language AS $$ // source code here $$; JVM supported languages •  Java, Scala •  Javascript (slow) •  Groovy, Jython, JRuby •  Clojure ( JSR 223 impl issue) 36
  • 37. How to create an UDF ? CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] [keyspace.]functionName (param1 type1, param2 type2, …) CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT RETURN returnType LANGUAGE language AS $$ // source code here $$; 37
  • 39. UDA •  Real use-case for UDF •  Aggregation server-side à huge network bandwidth saving •  Provide similar behavior for Group By, Sum, Avg etc … 39
  • 40. How to create an UDA ? CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] [keyspace.]aggregateName(type1, type2, …) SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction] INITCOND initCond; Only type, no param name State type Initial state type 40
  • 41. How to create an UDA ? CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] [keyspace.]aggregateName(type1, type2, …) SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction] INITCOND initCond; Accumulator function. Signature: accumulatorFunction(stateType, type1, type2, …) RETURNS stateType 41
  • 42. How to create an UDA ? CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] [keyspace.]aggregateName(type1, type2, …) SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction] INITCOND initCond; Optional final function. Signature: finalFunction(stateType) 42
  • 43. How to create an UDA ? CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] [keyspace.]aggregateName(type1, type2, …) SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction] INITCOND initCond; UDA return type ? If finalFunction •  return type of finalFunction Else •  return stateType 43
  • 45. Gotchas C* C* C* C* UDA ① ② & ③ ⑤ ② & ③ ② & ③ 45
  • 46. Gotchas C* C* C* C* UDA ① ② & ③ ⑤ ② & ③ ② & ③ 46 Why do not apply UDF/UDA on replica node ?
  • 47. Gotchas C* C* C* C* UDA ① ② & ③ ④ •  apply accumulatorFunction •  apply finalFunction ⑤ ② & ③ ② & ③ 1.  Because of eventual consistency 2.  UDF/UDA applied AFTER last-write-win logic 47
  • 48. Gotchas 48 •  UDA in Cassandra is not distributed ! •  Execute UDA on a large number of rows (106 for ex.) •  single fat partition •  multiple partitions •  full table scan •  à Increase client-side timeout •  default Java driver timeout = 12 secs •  JAVA-1033 JIRA for per-request timeout setting
  • 49. Cassandra UDA or Apache Spark ? 49 Consistency Level Single/Multiple Partition(s) Recommended Approach ONE Single partition UDA with token-aware driver because node local ONE Multiple partitions Apache Spark because distributed reads > ONE Single partition UDA because data-locality lost with Spark > ONE Multiple partitions Apache Spark definitely
  • 50. Cassandra UDA or Apache Spark ? 50 Consistency Level Single/Multiple Partition(s) Recommended Approach ONE Single partition UDA with token-aware driver because node local ONE Multiple partitions Apache Spark because distributed reads > ONE Single partition UDA because data-locality lost with Spark > ONE Multiple partitions Apache Spark definitely
  • 51. Q & A ! " 51
  • 52. New Storage Engine •  Data structure •  Disk space usage
  • 53. Pre 3.0 data structure Map<byte[ ], SortedMap<byte[ ], Cell>> 53 CREATE TABLE sensor_data( sensor_id uuid, date timestamp, sensor_type text, sensor_value double, PRIMARY KEY(sensor_id, date) );
  • 54. Pre 3.0 on disk layout 54 RowKey: de305d54-75b4-431b-adb2-eb6b9e546014 => (column=2015-04-27 10:00:00+0100:, value=, timestamp=1430128800) => (column=2015-04-27 10:00:00+0100:sensor_type, value=‘Temperature’, timestamp=1430128800) => (column=2015-04-27 10:00:00+0100:sensor_value, value=23.48, timestamp=1430128800) => (column=2015-04-27 10:01:00+0100:, value=, timestamp=1430128860) => (column=2015-04-27 10:01:00+0100:sensor_type, value=‘Temperature’, timestamp=1430128860) => (column=2015-04-27 10:01:00+0100:sensor_value, value=24.08, timestamp=1430128860) Clustering values are repeated for each normal column Full timestamp storage
  • 55. Cassandra 3.0 data structure Map<byte[ ], SortedMap<ClusteringColumn, Row>> 55 CREATE TABLE sensor_data( sensor_id uuid, date timestamp, sensor_type text, sensor_value double, PRIMARY KEY(sensor_id, date) );
  • 56. Cassandra 3.0 on disk layout 56 PartitionKey: de305d54-75b4-431b-adb2-eb6b9e546014 => clusteringColumn:2015-04-27 10:00:00+0100 => row_timestamp=1430128800 => (column_value=‘Temperature’, delta_encoded_timestamp=+0) => (column_value=23.48, delta_encoded_timestamp=+0) => clusteringColumn:2015-04-27 10:01:00+0100 => row_timestamp=1430128860 => (column_value=‘Temperature’, delta_encoded_timestamp=+0) => (column_value=24.08, delta_encoded_timestamp=+0) Delta-encoded timestamp vs row timestamp
  • 57. Gains 57 •  No clustering value repetition •  Column labels are stored only once in meta data •  Delta encoding of timestamp, 8 bytes saved each time •  Less disk space used
  • 58. Benchmarks 58 CREATE TABLE events ( id uuid, date timeuuid, prop1 int, prop2 text, prop3 float, PRIMARY KEY(id, date)); 106 rows Small string
  • 59. Benchmarks 59 CREATE TABLE largetext( key int, prop1 int, prop2 text, PRIMARY KEY(id)); 106 rows Large string (1000)
  • 60. Benchmarks 60 CREATE TABLE largeclustering( key int, clust text, prop1 int, prop2 set<float>, PRIMARY KEY(id, clust)); 106 rowsMedium string (100) 50 items
  • 61. Benchmarks 61 CREATE TABLE events ( id uuid, date timeuuid, prop1 int, prop2 text, prop3 float, PRIMARY KEY(id, date)) WITH COMPACT STORAGE ;
  • 62. Q & A ! " 62