SlideShare a Scribd company logo
Time Series with Apache Cassandra
Patrick McFadin

Chief Evangelist
@PatrickMcFadin
©2013 DataStax Confidential. Do not distribute without consent.

1
Quick intro to Cassandra
• Shared nothing
• Masterless peer-to-peer
• Based on Dynamo
Scaling
• Add nodes to scale
• Millions Ops/s

THROUGHPUT OPS/SEC)

Cassandra

HBase

Redis

MySQL
Uptime
• Built to replicate
• Resilient to failure
• Always on

NONE
Easy to use
• CQL is a familiar syntax
• Friendly to programmers
• Paxos for locking

CREATE TABLE users (!
username varchar,!
firstname varchar,!
lastname varchar,!
email list<varchar>,!
password varchar,!
created_date timestamp,!
PRIMARY KEY (username)!
);

INSERT INTO users (username, firstname, lastname, !
email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00');!

INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],!
'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00')!
IF NOT EXISTS;
Time series in production
• It’s all about “What’s happening”
• Data is the new currency

“Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of
financial data, ingesting into its database 2million pieces of information a second from every
major trading exchange.”*
* http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
Why Cassandra for Time Series
Scales
Resilient
Good data model
Efficient Storage Model

What about that?
Data Model
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);

• Weather Station Id and Time
are unique
• Store as many as needed

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:01:00','72F');
!

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:02:00','73F');
!

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:03:00','73F');
!

INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
Storage Model - Logical View
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';

weatherstation_id

event_time

temperature

2013-04-03 07:01:00

1234ABCD

72F
2013-04-03 07:02:00

1234ABCD

73F
2013-04-03 07:03:00

1234ABCD

73F
2013-04-03 07:04:00

1234ABCD

74F
Storage Model - Disk Layout
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';

2013-04-03 07:01:00

1234ABCD

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

2013-04-03 07:04:00

73F

Merged, Sorted and Stored Sequentially

74F

2013-04-03 07:05:00
!

2013-04-03 07:06:00
!

74F

75F

!

!
Query patterns
SELECT temperature
FROM event_time,temperature
WHERE weatherstation_id='1234ABCD'
AND event_time > '2013-04-03 07:01:00'
AND event_time < '2013-04-03 07:04:00';

• Range queries
• “Slice” operation on disk

Single seek on disk
2013-04-03 07:01:00

1234ABCD

72F

2013-04-03 07:02:00

73F

2013-04-03 07:03:00

73F

2013-04-03 07:04:00

74F

2013-04-03 07:05:00
!

2013-04-03 07:06:00
!

74F

75F

!

!
Query patterns
SELECT temperature
FROM event_time,temperature
WHERE weatherstation_id='1234ABCD'
AND event_time > '2013-04-03 07:01:00'
AND event_time < '2013-04-03 07:04:00';
weatherstation_id

event_time

• Range queries
• “Slice” operation on disk

temperature

2013-04-03 07:01:00

1234ABCD

72F

Sorted by event_time

2013-04-03 07:02:00

1234ABCD

73F
2013-04-03 07:03:00

1234ABCD

73F
2013-04-03 07:04:00

1234ABCD

74F

Programmers like this
Ingestion models
• Apache Kafka
• Apache Flume
• Storm
• Custom Applications

Apache Kafka

Your totally!
killer!
application
Dealing with data at speed
• 1 million writes per second?
• 1 insert every microsecond
• Collisions?

Your totally!
killer!
application

weatherstation_id='5678EFGH'

• Primary Key determines node
placement
• Random partitioning
• Special data type - TimeUUID

weatherstation_id='1234ABCD'
TimeUUID
Timestamp to Microsecond

+

UUID

=

TimeUUID

• Also known as a Version 1 UUID
• Sortable
• Reversible

04d580b0-9412-11e3-baa8-0800200c9a66

=

Wednesday, February 12, 2014 6:18:06 PM GMT

http://www.famkruithof.net/uuid/uuidgen
Way more information
www.planetcassandra.org
!

• 5 minute interviews
• Use cases
• Free training!
Thank You!

Follow me for more updates all the time: @PatrickMcFadin

More Related Content

What's hot (20)

PDF
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
PDF
Cassandra EU - Data model on fire
Patrick McFadin
 
PDF
Introduction to data modeling with apache cassandra
Patrick McFadin
 
PDF
Real data models of silicon valley
Patrick McFadin
 
PDF
Introduction to cassandra 2014
Patrick McFadin
 
PDF
Successful Architectures for Fast Data
Patrick McFadin
 
PDF
Spark Streaming with Cassandra
Jacek Lewandowski
 
PDF
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Piotr Kolaczkowski
 
PDF
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Patrick McFadin
 
PDF
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
DataStax Academy
 
PDF
Cassandra 3.0 advanced preview
Patrick McFadin
 
PDF
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
 
PDF
Cassandra Materialized Views
Carl Yeksigian
 
PPTX
Using Spark to Load Oracle Data into Cassandra
Jim Hatcher
 
PDF
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
 
PDF
Cassandra Community Webinar: Apache Cassandra Internals
DataStax
 
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
PDF
Cassandra Fundamentals - C* 2.0
Russell Spitzer
 
PDF
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
StampedeCon
 
PDF
Spark and Cassandra 2 Fast 2 Furious
Russell Spitzer
 
Cassandra Basics, Counters and Time Series Modeling
Vassilis Bekiaris
 
Cassandra EU - Data model on fire
Patrick McFadin
 
Introduction to data modeling with apache cassandra
Patrick McFadin
 
Real data models of silicon valley
Patrick McFadin
 
Introduction to cassandra 2014
Patrick McFadin
 
Successful Architectures for Fast Data
Patrick McFadin
 
Spark Streaming with Cassandra
Jacek Lewandowski
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Piotr Kolaczkowski
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Patrick McFadin
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
DataStax Academy
 
Cassandra 3.0 advanced preview
Patrick McFadin
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
DataStax
 
Cassandra Materialized Views
Carl Yeksigian
 
Using Spark to Load Oracle Data into Cassandra
Jim Hatcher
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Spark Summit
 
Cassandra Community Webinar: Apache Cassandra Internals
DataStax
 
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Cassandra Fundamentals - C* 2.0
Russell Spitzer
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
StampedeCon
 
Spark and Cassandra 2 Fast 2 Furious
Russell Spitzer
 

Similar to Time series with apache cassandra strata (20)

PDF
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
DataStax Academy
 
PDF
Cassandra Day SV 2014: Beyond Read-Modify-Write with Apache Cassandra
DataStax Academy
 
PDF
Apache cassandra & apache spark for time series data
Patrick McFadin
 
PDF
Cassandra Summit 2013 Keynote
jbellis
 
PDF
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
PDF
Managing Cassandra at Scale by Al Tobey
DataStax Academy
 
PDF
Apache Cassandra and Go
DataStax Academy
 
PDF
Advanced Cassandra
DataStax Academy
 
PDF
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
DataStax Academy
 
PDF
C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark
DataStax Academy
 
PDF
Mixing Batch and Real-time: Cassandra with Shark (Cassandra Europe 2013)
Richard Low
 
PDF
Oracle to Cassandra Core Concepts Guide Pt. 2
DataStax Academy
 
PDF
Cassandra Summit EU 2013
jbellis
 
PDF
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
DataStax Academy
 
PPTX
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
PDF
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax Academy
 
PDF
The world's next top data model
Patrick McFadin
 
PDF
Asterisk_MySQL_Cluster_Presentation.pdf
Delphini Systems Consultoria e Treinamento
 
PPT
Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...
djkucera
 
PDF
CouchDB
codebits
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
DataStax Academy
 
Cassandra Day SV 2014: Beyond Read-Modify-Write with Apache Cassandra
DataStax Academy
 
Apache cassandra & apache spark for time series data
Patrick McFadin
 
Cassandra Summit 2013 Keynote
jbellis
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
Managing Cassandra at Scale by Al Tobey
DataStax Academy
 
Apache Cassandra and Go
DataStax Academy
 
Advanced Cassandra
DataStax Academy
 
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
DataStax Academy
 
C* Summit EU 2013: Mixing Batch and Real-Time: Cassandra with Shark
DataStax Academy
 
Mixing Batch and Real-time: Cassandra with Shark (Cassandra Europe 2013)
Richard Low
 
Oracle to Cassandra Core Concepts Guide Pt. 2
DataStax Academy
 
Cassandra Summit EU 2013
jbellis
 
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
DataStax Academy
 
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax Academy
 
The world's next top data model
Patrick McFadin
 
Asterisk_MySQL_Cluster_Presentation.pdf
Delphini Systems Consultoria e Treinamento
 
Collaborate 2009 - Migrating a Data Warehouse from Microsoft SQL Server to Or...
djkucera
 
CouchDB
codebits
 
Ad

More from Patrick McFadin (13)

PDF
Open source or proprietary, choose wisely!
Patrick McFadin
 
PDF
Help! I want to contribute to an Open Source project but my boss says no.
Patrick McFadin
 
PDF
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
PDF
Advanced data modeling with apache cassandra
Patrick McFadin
 
PDF
Making money with open source and not losing your soul: A practical guide
Patrick McFadin
 
PDF
Cassandra 2.0 better, faster, stronger
Patrick McFadin
 
PDF
Building Antifragile Applications with Apache Cassandra
Patrick McFadin
 
PDF
Cassandra at scale
Patrick McFadin
 
PDF
Become a super modeler
Patrick McFadin
 
PDF
The data model is dead, long live the data model
Patrick McFadin
 
PDF
Cassandra Virtual Node talk
Patrick McFadin
 
PPT
Toronto jaspersoft meetup
Patrick McFadin
 
PDF
Cassandra data modeling talk
Patrick McFadin
 
Open source or proprietary, choose wisely!
Patrick McFadin
 
Help! I want to contribute to an Open Source project but my boss says no.
Patrick McFadin
 
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Advanced data modeling with apache cassandra
Patrick McFadin
 
Making money with open source and not losing your soul: A practical guide
Patrick McFadin
 
Cassandra 2.0 better, faster, stronger
Patrick McFadin
 
Building Antifragile Applications with Apache Cassandra
Patrick McFadin
 
Cassandra at scale
Patrick McFadin
 
Become a super modeler
Patrick McFadin
 
The data model is dead, long live the data model
Patrick McFadin
 
Cassandra Virtual Node talk
Patrick McFadin
 
Toronto jaspersoft meetup
Patrick McFadin
 
Cassandra data modeling talk
Patrick McFadin
 
Ad

Recently uploaded (20)

PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
July Patch Tuesday
Ivanti
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
July Patch Tuesday
Ivanti
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 

Time series with apache cassandra strata

  • 1. Time Series with Apache Cassandra Patrick McFadin
 Chief Evangelist @PatrickMcFadin ©2013 DataStax Confidential. Do not distribute without consent. 1
  • 2. Quick intro to Cassandra • Shared nothing • Masterless peer-to-peer • Based on Dynamo
  • 3. Scaling • Add nodes to scale • Millions Ops/s THROUGHPUT OPS/SEC) Cassandra HBase Redis MySQL
  • 4. Uptime • Built to replicate • Resilient to failure • Always on NONE
  • 5. Easy to use • CQL is a familiar syntax • Friendly to programmers • Paxos for locking CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)! ); INSERT INTO users (username, firstname, lastname, ! email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');! INSERT INTO users (username, firstname, ! lastname, email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')! IF NOT EXISTS;
  • 6. Time series in production • It’s all about “What’s happening” • Data is the new currency “Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”* * http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
  • 7. Why Cassandra for Time Series Scales Resilient Good data model Efficient Storage Model What about that?
  • 8. Data Model CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) ); • Weather Station Id and Time are unique • Store as many as needed INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
  • 9. Storage Model - Logical View SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; weatherstation_id event_time temperature 2013-04-03 07:01:00 1234ABCD 72F 2013-04-03 07:02:00 1234ABCD 73F 2013-04-03 07:03:00 1234ABCD 73F 2013-04-03 07:04:00 1234ABCD 74F
  • 10. Storage Model - Disk Layout SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; 2013-04-03 07:01:00 1234ABCD 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 2013-04-03 07:04:00 73F Merged, Sorted and Stored Sequentially 74F 2013-04-03 07:05:00 ! 2013-04-03 07:06:00 ! 74F 75F ! !
  • 11. Query patterns SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00'; • Range queries • “Slice” operation on disk Single seek on disk 2013-04-03 07:01:00 1234ABCD 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 2013-04-03 07:04:00 74F 2013-04-03 07:05:00 ! 2013-04-03 07:06:00 ! 74F 75F ! !
  • 12. Query patterns SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time > '2013-04-03 07:01:00' AND event_time < '2013-04-03 07:04:00'; weatherstation_id event_time • Range queries • “Slice” operation on disk temperature 2013-04-03 07:01:00 1234ABCD 72F Sorted by event_time 2013-04-03 07:02:00 1234ABCD 73F 2013-04-03 07:03:00 1234ABCD 73F 2013-04-03 07:04:00 1234ABCD 74F Programmers like this
  • 13. Ingestion models • Apache Kafka • Apache Flume • Storm • Custom Applications Apache Kafka Your totally! killer! application
  • 14. Dealing with data at speed • 1 million writes per second? • 1 insert every microsecond • Collisions? Your totally! killer! application weatherstation_id='5678EFGH' • Primary Key determines node placement • Random partitioning • Special data type - TimeUUID weatherstation_id='1234ABCD'
  • 15. TimeUUID Timestamp to Microsecond + UUID = TimeUUID • Also known as a Version 1 UUID • Sortable • Reversible 04d580b0-9412-11e3-baa8-0800200c9a66 = Wednesday, February 12, 2014 6:18:06 PM GMT http://www.famkruithof.net/uuid/uuidgen
  • 16. Way more information www.planetcassandra.org ! • 5 minute interviews • Use cases • Free training!
  • 17. Thank You! Follow me for more updates all the time: @PatrickMcFadin