SlideShare a Scribd company logo
Cassandra By Example:
Data Modelling with CQL3


Eric Evans
eevans@opennms.com
@jericevans
CQL is...
● Query language for Apache Cassandra
● Almost SQL (almost)
● Alternative query interface First class citizen
● More performant!
● Available since Cassandra 0.8.0 (almost 2
  years!)
Bad Old Days: Thrift RPC
Bad Old Days: Thrift RPC
// Your Column
Column col = new Column(ByteBuffer.wrap("name".getBytes()));
col.setValue(ByteBuffer.wrap("value".getBytes()));
col.setTimestamp(System.currentTimeMillis());


// Don't ask
ColumnOrSuperColumn cosc = new ColumnOrSuperColumn();
cosc.setColumn(col);


// Prepare to be amazed
Mutation mutation = new Mutation();
mutation.setColumnOrSuperColumn(cosc);


List<Mutation> mutations = new ArrayList<Mutation>();
mutations.add(mutation);


Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>();
Map cf_map = new HashMap<String, List<Mutation>>();
cf_map.set("Standard1", mutations);
mutations_map.put(ByteBuffer.wrap("key".getBytes()), cf_map);


cassandra.batch_mutate(mutations_map, consistency_level);
Better, no?




INSERT INTO (id, name) VALUES ('key', 'value');
But before we begin...
Partitioning

               Z   A




        Q              E




               M   I
Partitioning

               Z         A




        Q          Cat       E




               M         I
Partitioning

               Z         A




        Q          Cat       E




               M         I
Partitioning

         A



                                   Pets

                   Animal   Type     Size    Youtub-able

               E   Cat      mammal   small   true

                                     ...




         I
Cassandra By Example: Data Modelling with CQL3
Twissandra
● Twitter-inspired sample application
● Originally by Eric Florenzano, June 2009
● Python (Django)
● DBAPI-2 driver for CQL
● Favors simplicity over correctness!
● https://github.com/eevans/twissandra
   ○ See: cass.py
Twissandra
Twissandra
Twissandra
Twissandra
Twissandra
Twissandra Explained
users
users

-- User storage
CREATE TABLE users (
   username text PRIMARY KEY,
   password text
);
users

-- Adding users (signup)
INSERT INTO users (username, password)
    VALUES ('meg', 's3kr3t')
users
users

-- Lookup password (login)
SELECT password FROM users
    WHERE username = 'meg'
following / followers
following

-- Users a user is following
CREATE TABLE following (
   username text,
   followed text,
   PRIMARY KEY(username, followed)
);
following

-- Meg follows Stewie
INSERT INTO following (username, followed)
    VALUES ('meg', 'stewie')

-- Get a list of who Meg follows
SELECT followed FROM following
    WHERE username = 'meg'
users @meg is following
  followed
----------
    brian
    chris
     lois
    peter
   stewie
 quagmire
      ...
Cassandra By Example: Data Modelling with CQL3
followers

-- The users who follow username
CREATE TABLE followers (
   username text,
   following text,
   PRIMARY KEY(username, following)
);
followers

-- Meg follows Stewie
INSERT INTO followers (username, followed)
    VALUES ('stewie', 'meg')

-- Get a list of who follows Stewie
SELECT followers FROM following
    WHERE username = 'stewie'
redux: following / followers

-- @meg follows @stewie
BEGIN BATCH
  INSERT INTO following (username, followed)
      VALUES ('meg', 'stewie')
  INSERT INTO followers (username, followed)
      VALUES ('stewie', 'meg')
APPLY BATCH
tweets
Denormalization Ahead!
tweets

-- Tweet storage (think: permalink)
CREATE TABLE tweets (
   tweetid uuid PRIMARY KEY,
   username text,
   body text
);
tweets
-- Store a tweet
INSERT INTO tweets (
   tweetid,
   username,
   body
) VALUES (
   60780342-90fe-11e2-8823-0026c650d722,
   'stewie',
   'victory is mine!'
)
Query tweets by ... ?
● author, time descending
● followed authors, time descending
● date starting / date ending
userline
tweets, by user
userline
-- Materialized view of the tweets
-- created by user.
CREATE TABLE userline (
   username text,
   tweetid timeuuid,
   body text,
   PRIMARY KEY(username, tweetid)
);
Wait, WTF is a timeuuid?
● Aka "Type 1 UUID" (http://goo.gl/SWuCb)
● 100 nano second units since Oct. 15, 1582
● Timestamp is first 60 bits (sorts temporally!)
● Used like timestamp, but:
   ○ more granular
   ○ globally unique
userline
-- Range of tweets for a user
SELECT
  dateOf(tweetid), body
FROM
  userline
WHERE
  username = 'stewie' AND
  tweetid > minTimeuuid('2013-03-01 12:10:09')
ORDER BY
  tweetid DESC
LIMIT 40
@stewie's most recent tweets
 dateOf(posted_at)        | body
--------------------------+-------------------------------
 2013-03-19 14:43:15-0500 |               victory is mine!
 2013-03-19 13:23:24-0500 |      generate killer bandwidth
 2013-03-19 13:23:24-0500 |            grow B2B e-business
 2013-03-19 13:23:24-0500 |   innovate vertical e-services
 2013-03-19 13:23:24-0500 | deploy e-business experiences
 2013-03-19 13:23:24-0500 | grow intuitive infrastructures
 ...
timeline
tweets from those a user follows
timeline
-- Materialized view of tweets from
-- the users username follows.
CREATE TABLE timeline (
   username text,
   tweetid timeuuid,
   posted_by text,
   body text,
   PRIMARY KEY(username, tweetid)
);
timeline
-- Range of tweets for a user
SELECT
  dateOf(tweetid), posted_by, body
FROM
  timeline
WHERE
  username = 'stewie' AND
  tweetid > '2013-03-01 12:10:09'
ORDER BY
  tweetid DESC
LIMIT 40
most recent tweets for @meg
 dateOf(posted_at)        | posted_by | body
--------------------------+-----------+-------------------
 2013-03-19 14:43:15-0500 |    stewie |   victory is mine!
 2013-03-19 13:23:25-0500 |       meg |   evolve intuit...
 2013-03-19 13:23:25-0500 |       meg | whiteboard bric...
 2013-03-19 13:23:25-0500 |    stewie |      brand clic...
 2013-03-19 13:23:25-0500 |     brian | synergize gran...
 2013-03-19 13:23:24-0500 |     brian | expedite real-t...
 2013-03-19 13:23:24-0500 |    stewie |    generate kil...
 2013-03-19 13:23:24-0500 |    stewie |       grow B2B ...
 2013-03-19 13:23:24-0500 |       meg | generate intera...
 ...
redux: tweets
-- @stewie tweets
BEGIN BATCH
  INSERT INTO tweets ...
  INSERT INTO userline ...
  INSERT INTO timeline ...
  INSERT INTO timeline ...
  INSERT INTO timeline ...
  ...
APPLY BATCH
In Conclusion:
● Think in terms of your queries, store that
● Don't fear duplication; Space is cheap to scale
● Go wide; Rows can have 2 billion columns!
● The only thing better than NoSQL, is MoSQL
● Python hater? Java ❤'r?
   ○ https://github.com/eevans/twissandra-j
● http://goo.gl/zPOD
The   End

More Related Content

What's hot (20)

PPTX
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
PDF
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
PDF
Introduction to Cassandra Basics
nickmbailey
 
PDF
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
PPTX
Elastic Stack Introduction
Vikram Shinde
 
PPTX
Spy hard, challenges of 100G deep packet inspection on x86 platform
Redge Technologies
 
PPT
Cassandra Data Model
ebenhewitt
 
PDF
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Altinity Ltd
 
PDF
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Altinity Ltd
 
PDF
Get to know PostgreSQL!
Oddbjørn Steffensen
 
PPTX
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
PDF
Tuning Autovacuum in Postgresql
Mydbops
 
PDF
Achieving compliance With MongoDB Security
Mydbops
 
PDF
Mastering PostgreSQL Administration
EDB
 
PDF
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
PDF
TeraStream for ETL
치민 최
 
PDF
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
PDF
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Altinity Ltd
 
PPTX
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
PPTX
Elastic stack Presentation
Amr Alaa Yassen
 
Cassandra Data Modeling - Practical Considerations @ Netflix
nkorla1share
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
Introduction to Cassandra Basics
nickmbailey
 
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
Elastic Stack Introduction
Vikram Shinde
 
Spy hard, challenges of 100G deep packet inspection on x86 platform
Redge Technologies
 
Cassandra Data Model
ebenhewitt
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Altinity Ltd
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Altinity Ltd
 
Get to know PostgreSQL!
Oddbjørn Steffensen
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Altinity Ltd
 
Tuning Autovacuum in Postgresql
Mydbops
 
Achieving compliance With MongoDB Security
Mydbops
 
Mastering PostgreSQL Administration
EDB
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
TeraStream for ETL
치민 최
 
Cassandra serving netflix @ scale
Vinay Kumar Chella
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Altinity Ltd
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
Elastic stack Presentation
Amr Alaa Yassen
 

Viewers also liked (7)

PDF
Cassandra NoSQL Tutorial
Michelle Darling
 
PPTX
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
PDF
Cassandra Tutorial
mubarakss
 
PDF
Cassandra Introduction & Features
DataStax Academy
 
PDF
NoSQL Essentials: Cassandra
Fernando Rodriguez
 
PPTX
An Overview of Apache Cassandra
DataStax
 
PDF
Cassandra Explained
Eric Evans
 
Cassandra NoSQL Tutorial
Michelle Darling
 
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
Cassandra Tutorial
mubarakss
 
Cassandra Introduction & Features
DataStax Academy
 
NoSQL Essentials: Cassandra
Fernando Rodriguez
 
An Overview of Apache Cassandra
DataStax
 
Cassandra Explained
Eric Evans
 
Ad

Similar to Cassandra By Example: Data Modelling with CQL3 (20)

PDF
Cassandra by Example: Data Modelling with CQL3
Eric Evans
 
PDF
NoSQL Overview
adesso AG
 
PDF
C*ollege Credit: Data Modeling for Apache Cassandra
DataStax
 
PDF
Manchester Hadoop User Group: Cassandra Intro
Christopher Batey
 
PDF
Apache Cassandra for Timeseries- and Graph-Data
Guido Schmutz
 
KEY
Building a Highly Scalable, Open Source Twitter Clone
Paul Brown
 
PDF
DataDay 2023 Presentation
Max De Marzi
 
PPT
NoSQL databases pros and cons
Fabio Fumarola
 
PDF
CQL3 and Data Modeling 101 with Apache Cassandra
Chris McEniry
 
PDF
Cassandra 2012
beobal
 
PDF
Outside The Box With Apache Cassnadra
Eric Evans
 
PDF
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
PDF
Cassandra at Disqus — SF Cassandra Users Group July 31st
DataStax Academy
 
PDF
Cassandra sf meetup_2013_07_31
George Courtsunis
 
PDF
Slide presentation pycassa_upload
Rajini Ramesh
 
PDF
Scaling the Web: Databases & NoSQL
Richard Schneeman
 
PDF
Cassandra Data Modelling with CQL (OSCON 2015)
twentyideas
 
PDF
The NoSQL store everyone ignored
Zohaib Hassan
 
PDF
Ben Coverston - The Apache Cassandra Project
Morningstar Tech Talks
 
PPTX
Cassandra
Bang Tsui Liou
 
Cassandra by Example: Data Modelling with CQL3
Eric Evans
 
NoSQL Overview
adesso AG
 
C*ollege Credit: Data Modeling for Apache Cassandra
DataStax
 
Manchester Hadoop User Group: Cassandra Intro
Christopher Batey
 
Apache Cassandra for Timeseries- and Graph-Data
Guido Schmutz
 
Building a Highly Scalable, Open Source Twitter Clone
Paul Brown
 
DataDay 2023 Presentation
Max De Marzi
 
NoSQL databases pros and cons
Fabio Fumarola
 
CQL3 and Data Modeling 101 with Apache Cassandra
Chris McEniry
 
Cassandra 2012
beobal
 
Outside The Box With Apache Cassnadra
Eric Evans
 
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
Cassandra at Disqus — SF Cassandra Users Group July 31st
DataStax Academy
 
Cassandra sf meetup_2013_07_31
George Courtsunis
 
Slide presentation pycassa_upload
Rajini Ramesh
 
Scaling the Web: Databases & NoSQL
Richard Schneeman
 
Cassandra Data Modelling with CQL (OSCON 2015)
twentyideas
 
The NoSQL store everyone ignored
Zohaib Hassan
 
Ben Coverston - The Apache Cassandra Project
Morningstar Tech Talks
 
Cassandra
Bang Tsui Liou
 
Ad

More from Eric Evans (20)

PDF
Wikimedia Content API (Strangeloop)
Eric Evans
 
PDF
Wikimedia Content API: A Cassandra Use-case
Eric Evans
 
PDF
Wikimedia Content API: A Cassandra Use-case
Eric Evans
 
PDF
Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Eric Evans
 
PDF
Time Series Data with Apache Cassandra
Eric Evans
 
PDF
Time Series Data with Apache Cassandra
Eric Evans
 
PDF
It's not you, it's me: Ending a 15 year relationship with RRD
Eric Evans
 
PDF
Time series storage in Cassandra
Eric Evans
 
PDF
Virtual Nodes: Rethinking Topology in Cassandra
Eric Evans
 
PDF
Rethinking Topology In Cassandra (ApacheCon NA)
Eric Evans
 
PDF
Virtual Nodes: Rethinking Topology in Cassandra
Eric Evans
 
KEY
Castle enhanced Cassandra
Eric Evans
 
PDF
CQL: SQL In Cassandra
Eric Evans
 
PDF
CQL In Cassandra 1.0 (and beyond)
Eric Evans
 
PDF
Cassandra: Not Just NoSQL, It's MoSQL
Eric Evans
 
PDF
NoSQL Yes, But YesCQL, No?
Eric Evans
 
PDF
Cassandra Explained
Eric Evans
 
PDF
The Cassandra Distributed Database
Eric Evans
 
PDF
An Introduction To Cassandra
Eric Evans
 
PDF
Cassandra In A Nutshell
Eric Evans
 
Wikimedia Content API (Strangeloop)
Eric Evans
 
Wikimedia Content API: A Cassandra Use-case
Eric Evans
 
Wikimedia Content API: A Cassandra Use-case
Eric Evans
 
Time Series Data with Apache Cassandra (ApacheCon EU 2014)
Eric Evans
 
Time Series Data with Apache Cassandra
Eric Evans
 
Time Series Data with Apache Cassandra
Eric Evans
 
It's not you, it's me: Ending a 15 year relationship with RRD
Eric Evans
 
Time series storage in Cassandra
Eric Evans
 
Virtual Nodes: Rethinking Topology in Cassandra
Eric Evans
 
Rethinking Topology In Cassandra (ApacheCon NA)
Eric Evans
 
Virtual Nodes: Rethinking Topology in Cassandra
Eric Evans
 
Castle enhanced Cassandra
Eric Evans
 
CQL: SQL In Cassandra
Eric Evans
 
CQL In Cassandra 1.0 (and beyond)
Eric Evans
 
Cassandra: Not Just NoSQL, It's MoSQL
Eric Evans
 
NoSQL Yes, But YesCQL, No?
Eric Evans
 
Cassandra Explained
Eric Evans
 
The Cassandra Distributed Database
Eric Evans
 
An Introduction To Cassandra
Eric Evans
 
Cassandra In A Nutshell
Eric Evans
 

Recently uploaded (20)

PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Wondershare Filmora Crack Free Download 2025
josanj305
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PPTX
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
PDF
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
PDF
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Wondershare Filmora Crack Free Download 2025
josanj305
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 

Cassandra By Example: Data Modelling with CQL3

  • 1. Cassandra By Example: Data Modelling with CQL3 Eric Evans [email protected] @jericevans
  • 2. CQL is... ● Query language for Apache Cassandra ● Almost SQL (almost) ● Alternative query interface First class citizen ● More performant! ● Available since Cassandra 0.8.0 (almost 2 years!)
  • 3. Bad Old Days: Thrift RPC
  • 4. Bad Old Days: Thrift RPC // Your Column Column col = new Column(ByteBuffer.wrap("name".getBytes())); col.setValue(ByteBuffer.wrap("value".getBytes())); col.setTimestamp(System.currentTimeMillis()); // Don't ask ColumnOrSuperColumn cosc = new ColumnOrSuperColumn(); cosc.setColumn(col); // Prepare to be amazed Mutation mutation = new Mutation(); mutation.setColumnOrSuperColumn(cosc); List<Mutation> mutations = new ArrayList<Mutation>(); mutations.add(mutation); Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>(); Map cf_map = new HashMap<String, List<Mutation>>(); cf_map.set("Standard1", mutations); mutations_map.put(ByteBuffer.wrap("key".getBytes()), cf_map); cassandra.batch_mutate(mutations_map, consistency_level);
  • 5. Better, no? INSERT INTO (id, name) VALUES ('key', 'value');
  • 6. But before we begin...
  • 7. Partitioning Z A Q E M I
  • 8. Partitioning Z A Q Cat E M I
  • 9. Partitioning Z A Q Cat E M I
  • 10. Partitioning A Pets Animal Type Size Youtub-able E Cat mammal small true ... I
  • 12. Twissandra ● Twitter-inspired sample application ● Originally by Eric Florenzano, June 2009 ● Python (Django) ● DBAPI-2 driver for CQL ● Favors simplicity over correctness! ● https://github.com/eevans/twissandra ○ See: cass.py
  • 19. users
  • 20. users -- User storage CREATE TABLE users ( username text PRIMARY KEY, password text );
  • 21. users -- Adding users (signup) INSERT INTO users (username, password) VALUES ('meg', 's3kr3t')
  • 22. users
  • 23. users -- Lookup password (login) SELECT password FROM users WHERE username = 'meg'
  • 25. following -- Users a user is following CREATE TABLE following ( username text, followed text, PRIMARY KEY(username, followed) );
  • 26. following -- Meg follows Stewie INSERT INTO following (username, followed) VALUES ('meg', 'stewie') -- Get a list of who Meg follows SELECT followed FROM following WHERE username = 'meg'
  • 27. users @meg is following followed ---------- brian chris lois peter stewie quagmire ...
  • 29. followers -- The users who follow username CREATE TABLE followers ( username text, following text, PRIMARY KEY(username, following) );
  • 30. followers -- Meg follows Stewie INSERT INTO followers (username, followed) VALUES ('stewie', 'meg') -- Get a list of who follows Stewie SELECT followers FROM following WHERE username = 'stewie'
  • 31. redux: following / followers -- @meg follows @stewie BEGIN BATCH INSERT INTO following (username, followed) VALUES ('meg', 'stewie') INSERT INTO followers (username, followed) VALUES ('stewie', 'meg') APPLY BATCH
  • 34. tweets -- Tweet storage (think: permalink) CREATE TABLE tweets ( tweetid uuid PRIMARY KEY, username text, body text );
  • 35. tweets -- Store a tweet INSERT INTO tweets ( tweetid, username, body ) VALUES ( 60780342-90fe-11e2-8823-0026c650d722, 'stewie', 'victory is mine!' )
  • 36. Query tweets by ... ? ● author, time descending ● followed authors, time descending ● date starting / date ending
  • 38. userline -- Materialized view of the tweets -- created by user. CREATE TABLE userline ( username text, tweetid timeuuid, body text, PRIMARY KEY(username, tweetid) );
  • 39. Wait, WTF is a timeuuid? ● Aka "Type 1 UUID" (http://goo.gl/SWuCb) ● 100 nano second units since Oct. 15, 1582 ● Timestamp is first 60 bits (sorts temporally!) ● Used like timestamp, but: ○ more granular ○ globally unique
  • 40. userline -- Range of tweets for a user SELECT dateOf(tweetid), body FROM userline WHERE username = 'stewie' AND tweetid > minTimeuuid('2013-03-01 12:10:09') ORDER BY tweetid DESC LIMIT 40
  • 41. @stewie's most recent tweets dateOf(posted_at) | body --------------------------+------------------------------- 2013-03-19 14:43:15-0500 | victory is mine! 2013-03-19 13:23:24-0500 | generate killer bandwidth 2013-03-19 13:23:24-0500 | grow B2B e-business 2013-03-19 13:23:24-0500 | innovate vertical e-services 2013-03-19 13:23:24-0500 | deploy e-business experiences 2013-03-19 13:23:24-0500 | grow intuitive infrastructures ...
  • 42. timeline tweets from those a user follows
  • 43. timeline -- Materialized view of tweets from -- the users username follows. CREATE TABLE timeline ( username text, tweetid timeuuid, posted_by text, body text, PRIMARY KEY(username, tweetid) );
  • 44. timeline -- Range of tweets for a user SELECT dateOf(tweetid), posted_by, body FROM timeline WHERE username = 'stewie' AND tweetid > '2013-03-01 12:10:09' ORDER BY tweetid DESC LIMIT 40
  • 45. most recent tweets for @meg dateOf(posted_at) | posted_by | body --------------------------+-----------+------------------- 2013-03-19 14:43:15-0500 | stewie | victory is mine! 2013-03-19 13:23:25-0500 | meg | evolve intuit... 2013-03-19 13:23:25-0500 | meg | whiteboard bric... 2013-03-19 13:23:25-0500 | stewie | brand clic... 2013-03-19 13:23:25-0500 | brian | synergize gran... 2013-03-19 13:23:24-0500 | brian | expedite real-t... 2013-03-19 13:23:24-0500 | stewie | generate kil... 2013-03-19 13:23:24-0500 | stewie | grow B2B ... 2013-03-19 13:23:24-0500 | meg | generate intera... ...
  • 46. redux: tweets -- @stewie tweets BEGIN BATCH INSERT INTO tweets ... INSERT INTO userline ... INSERT INTO timeline ... INSERT INTO timeline ... INSERT INTO timeline ... ... APPLY BATCH
  • 47. In Conclusion: ● Think in terms of your queries, store that ● Don't fear duplication; Space is cheap to scale ● Go wide; Rows can have 2 billion columns! ● The only thing better than NoSQL, is MoSQL ● Python hater? Java ❤'r? ○ https://github.com/eevans/twissandra-j ● http://goo.gl/zPOD
  • 48. The End