SlideShare a Scribd company logo
Apache Cassandra in Action




 Jonathan Ellis
 jbellis@datastax.com / @spyced
Why Cassandra?
•
    Relational databases are not designed to
    scale
•
    B-trees are slow
    –
        and require read-before-write
Cassandra Tutorial
Cassandra Tutorial
Cassandra Tutorial
Cassandra Tutorial
Cassandra Tutorial
Cassandra Tutorial
(“The eBay Architecture,” Randy Shoup and Dan Pritchett)
Cassandra Tutorial
Cassandra Tutorial
Cassandra Tutorial
Cassandra Tutorial
Reader
                    Memtable
     Writer




    Commitlog




The Log-Structured Merge-Tree,
Bigtable: A Distributed Storage
System for Structured Data
Dynamo, 2007
Bigtable, 2006




                           OSS, 2008




         Incubator, 2009       TLP, 2010
Cassandra in production
•
    Digital Reasoning: NLP + entity analytics
•
    OpenWave: enterprise messaging
•
    OpenX: largest publisher-side ad network in the
    world
•
    Cloudkick: performance data & aggregation
•
    SimpleGEO: location-as-API
•
    Ooyala: video analytics and business intelligence
•
    ngmoco: massively multiplayer game worlds
FUD?
•
    “Cassandra is only appropriate for
    unimportant data.”
Durabilty
•
    Write to commitlog
    –
        fsync is cheap since it’s append-only
•
    Write to memtable
•
    [amortized] flush memtable to sstable
SSTable format, briefly


       <key 127>
       <key 255>             <row data 0>
       ...                   <row data 1>
                             ...
                             <row data 127>
                             ...
                             <row data 255>
                             ...


                   Sorted [clustered] by row key
Scaling
W   A




T
        L
W   A




            F


T
        L
W           A




                    F
        (A-L]



T
                L
W           A




        (A-F]       F


T
        (F-L]   L
Key “C”
              W   A




                      F


          T
                  L
Reliability
•
    No single points of failure
•
    Multiple datacenters
•
    Monitorable
Some headlines
•
    “Resyncing Broken MySQL Replication”
•
    “How To Repair MySQL Replication”
•
    “Fixing Broken MySQL Database Replication”
•
    “Replication on Linux broken after db restore”
•
    “MySQL :: Repairing broken replication”
Cassandra Tutorial
Cassandra Tutorial
Good architecture solves multiple
problems at once
•
    Availability in single datacenter
•
    Availability in multiple datacenters
Y
                        Key “C”
            A
    W



U
                    F



    T
                L
        P
Y
                           Key “C”
               A
    W



U
                       F




               X
    T   hint
                   L
        P
Y
            A
    W



U
                    F



    T
                L
        P
Cassandra Tutorial
Y
                        Key “C”
            A
    W



U
                    F



    T
                L
        P
Y
                    Key “C”
            A
    W



U
                F


    T
            L
        P
Tuneable consistency
•
    ONE, QUORUM, ALL
•
    R+W>N
•
    Choose availability vs consistency (and latency)
Monitorable
JMX
OpsCenter
When do you need Cassandra?
•
    Ian Eure: “If you’re deploying memcache on top of your
    database, you’re inventing your own ad-hoc, difficult to
    maintain NoSQL data store”
Not Only SQL
•
    Curt Monash: “ACID-compliant transaction integrity
    commonly costs more in terms of DBMS licenses and many other
    components of TCO (Total Cost of Ownership) than [scalable
    NoSQL]. Worse, it can actually hurt application uptime,
    by forcing your system to pull in its horns and stop functioning in the
    face of failures that a non-transactional system might smoothly work
    around. Other flavors of “complexity can be a bad thing” apply as
    well. Thus, transaction integrity can be more trouble
    than it’s worth.” [Curt’s emphasis]
Cassandra Tutorial
Keyspaces & ColumnFamilies
•
    Conceptually, like “schemas” and “tables”
Inside CFs, columns are dynamic
•
    Twitter: “Fifteen months ago, it took two
    weeks to perform ALTER TABLE on the
    statuses [tweets] table.”
ColumnFamilies
•
    Static
    –
        Object data
•
    Dynamic
    –
        Precalculated query results
“static” columnfamilies

                      Users
   zznate    Password: *    Name: Nate

   driftx    Password: *   Name: Brandon

   thobbs    Password: *    Name: Tyler

   jbellis   Password: *   Name: Jonathan   Site: riptano.com
“dynamic” columnfamilies

                     Following
zznate    driftx:   thobbs:

driftx

thobbs    zznate:

jbellis   driftx:   mdennis:   pcmanus   thobbs:   xedin:   zznate
Inserting
•
    Really “insert or update”
•
    Not a key/value store – update as much of
    the row as you want
Example: twissandra
•
    http://twissandra.com
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    username VARCHAR(64),
    password VARCHAR(64)
);

CREATE TABLE following (
    user INTEGER REFERENCES user(id),
    followed INTEGER REFERENCES user(id)
);

CREATE TABLE tweets (
    id INTEGER,
    user INTEGER REFERENCES user(id),
    body VARCHAR(140),
    timestamp TIMESTAMP
);
Cassandrified
create column family users with comparator = UTF8Type
and column_metadata = [{column_name: password,
validation_class: UTF8Type}]

create column family tweets with comparator = UTF8Type
and column_metadata = [{column_name: body, validation_class:
UTF8Type}, {column_name: username, validation_class:
UTF8Type}]

create column family friends with comparator = UTF8Type
create column family followers with comparator = UTF8Type

create column family userline with comparator = LongType and
default_validation_class = UUIDType
create column family timeline with comparator = LongType and
default_validation_class = UUIDType
Connecting
CLIENT = pycassa.connect_thread_local('Twissandra')

USER = pycassa.ColumnFamily(CLIENT, 'User')
User
RowKey: ericflo
=> (column=password, value=****,
timestamp=1289446382541473)

-------------------
RowKey: jbellis
=> (column=password, value=****,
timestamp=1289446438490709)


uname = 'jericevans'
password = '**********'

columns = {'password': password}

USER.insert(uname, columns)
Natural keys vs surrogate
Friends and Followers
RowKey: ericflo

=> (column=jbellis, value=1289446467611029,
timestamp=1289446467611064)

=> (column=b6n, value=1289446467611031,
timestamp=1289446467611080)

to_uname = 'ericflo'

FRIENDS.insert(uname, {to_uname: time.time()})
FOLLOWERS.insert(to_uname, {uname: time.time()})
zznate    driftx:   thobbs:

driftx

thobbs    zznate:

jbellis   driftx:   mdenni    pcmanu   thobbs:   xedin:   zznat
                      s:        s:                          e:
Tweets
RowKey: 92dbeb50-ed45-11df-a6d0-000c29864c4f

=> (column=body, value=Four score and seven years ago,
timestamp=1289446891681799)

=> (column=username, value=alincoln,
timestamp=1289446891681799)

-------------------
RowKey: d418a66e-edc5-11df-ae6c-000c29864c4f

=> (column=body, value=Do geese see God?,
timestamp=1289501976713199)

=> (column=username, value=pdrome,
timestamp=1289501976713199)
Userline
RowKey: ericflo

=> (column=1289446393708810, value=6a0b4834-ed44-11df-
bc31-000c29864c4f, timestamp=1289446393710212)

=> (column=1289446397693831, value=6c6b5916-ed44-11df-
bc31-000c29864c4f, timestamp=1289446397694646)

=> (column=1289446891681780, value=92dbeb50-ed45-11df-
a6d0-000c29864c4f, timestamp=1289446891685065)

=> (column=1289446897315887, value=96379f92-ed45-11df-
a6d0-000c29864c4f, timestamp=1289446897317676)
Userline


zznate    1289847840615: 3f19757a-c89d...   1289847887086: a20fcf52-595c...


driftx

thobbs    1289847887086: a20fcf52-595c...


jbellis   1289847840615: 3f19757a-c89d...   128984784425: 844e75e2-b546...
Cassandra Tutorial
Timeline
RowKey: ericflo

=> (column=1289446393708810, value=6a0b4834-ed44-11df-
bc31-000c29864c4f, timestamp=1289446393710212)

=> (column=1289446397693831, value=6c6b5916-ed44-11df-
bc31-000c29864c4f, timestamp=1289446397694646)

=> (column=1289446891681780, value=92dbeb50-ed45-11df-
a6d0-000c29864c4f, timestamp=1289446891685065)

=> (column=1289446897315887, value=96379f92-ed45-11df-
a6d0-000c29864c4f, timestamp=1289446897317676)
Adding a tweet
tweet_id = str(uuid())
body = '@ericflo thanks for Twissandra, it helps!'
timestamp = long(time.time() * 1e6)

columns = {'uname': useruuid, 'body': body}
TWEET.insert(tweet_id, columns)

columns = {ts: tweet_id}
USERLINE.insert(uname, columns)

TIMELINE.insert(uname, columns)
for follower_uname in FOLLOWERS.get(uname, 5000):
    TIMELINE.insert(follower_uname, columns)
Reads
timeline = USERLINE.get(uname, column_reversed=True)
tweets = TWEET.multiget(timeline.values())


start = request.GET.get('start')
limit = NUM_PER_PAGE

timeline = TIMELINE.get(uname, column_start=start,
column_count=limit, column_reversed=True)
tweets = TWEET.multiget(timeline.values())
Programatically
•
    Don't use thrift directly
•
    Higher level clients have a lot of features you
    want
    –
        Knowledge about data types
    –
        Connection pooling
    –
        Automatic retries
    –
        Logging
Raw thrift API: Connecting
def get_client(host='127.0.0.1', port=9170):
    socket = TSocket.TSocket(host, port)
    transport = TTransport.TBufferedTransport(socket)
    transport.open()
    protocol =
TBinaryProtocol.TBinaryProtocolAccelerated(transport)
    client = Cassandra.Client(protocol)
    return client
Raw thrift API: Inserting
data = {'id': useruuid, ...}
columns = [Column(k, v, time.time())
           for (k, v) in data.items()]
mutations = [Mutation(ColumnOrSuperColumn(column=c))
             for c in columns]
rows = {useruuid: {'User': mutations}}

client.batch_mutate('Twissandra', rows,
ConsistencyLevel.ONE)
API layers
•
    libpq    •
                 Thrift
•
    JDBC     •
                 Hector
•
    JPA      •
                 Hector object-
                 mapper
Running twissandra
•
    Login: notroot/notroot
    –
        (root/riptano)


•
    cd twissandra
•
    python manage.py runserver &
•
    Navigate to http://127.0.0.1:8000
•
    Login as jim/jim, tom/tom, or create your own
One more thing
•
    !PUBLIC! userline
Exercise 1
•
    $ cassandra-cli --host localhost
•
    ] use twissandra;
    ] help;
    ] help list;
    ] help get;
    ] help del;
•
    Delete the most recent tweet
    –
        How would you find this w/o looking at the UI?
Exercise 2
•
    User jim is following user tom, but
    twissandra doesn't populate Timeline with
    tweets from before the follow action.
•
    Insert a tweet from tom before the follow
    action into jim's timeline
Secondary (column) indexes
Exercise 3
•
    Add a state column to the Tweet column
    family definition, with an index (index_type
    KEYS).
    –
        Hint: a no-op update column family on Tweet would be
        update column family Tweet with
        column_metadata=[{column_name:body,
        validation_class:UTF8Type}, {column_name:username,
        validation_class:UTF8Type}]
•
    Set the state column on several tweets to TX.
    Select them using get … where.
Language support
•
    Python
    –
        pycassa
    –
        telephus
•
    Ruby
    –
        Speed is a negative
•
    Java
    –
        Hector
•
    PHP
    –
        phpcassa
Done yet?
•
    Still doing 1+N queries per page
•
    Solution: Supercolumns
Applying SuperColumns to Twissandra

jbellis   1289847840615
            1289847844275      1289847844275     1289847887086
                                                 1289847844275
                 Id:
                  Id:               Id:
                                     Id:                Id:
                                                      Id:
          3f19757a-c89d...
              3f19757a-       844e75e2-b546...
                                 3f19757a-       a20fcf52-595c...
                                                  3f19757a-
               c89d...             c89d...          c89d...
              uname:
               uname:             uname:
                                   uname:          uname:
                                                    uname:
              zznate
               zznate              driftx
                                   zznate           zznate
                                                     zznate

                body:
                 body:            body:
                                    body:             body:
                                                       body:
          O Do geese see
            stone be not so   Rise geese see
                               Do to vote sir    Do Igeese see
                                                      prefer pi
                  ...                ...                ...
Supercolumns: limitations
•
    Requires reading an entire SC (not the entire
    row) from disk even if you just want one
    subcolumn
UUIDs
•
    Column names should be uuids, not longs,
    to avoid collisions
•
    Version 1 UUIDs can be sorted by time
    (“TimeUUID”)
•
    Any UUID can be sorted by its raw bytes
    (“LexicalUUID”)
    –
        Usually Version 4
    –
        Slightly less overhead
Lucandra
•
    What documents contain term X?
    –
        … and term Y?
    –
        … or start with Z?
Fields and Terms

<doc>
  <field name=”title”>apache talk</field>
  <field name=”date”>20110201</field>
</doc>


   feld      term       freq     position
   title    apache       1          0
   title      talk       1          1
   date    20110201      1          0
Lucandra ColumnFamilies
create column family documents with comparator = BytesType;

Create column family terminfo with column_type = Super and
comparator = BytesType and subcomparator = BytesType;
Lucandra data
Document Key      col name         value
"documentId" => { fieldName , value }

Term Key          col name         value
"field/term" => { documentId , position vector }
Lucandra queries
•
    get_slice
•
    get_range_slices
•
    No silver bullet
FAQ: counting
•
    UUIDs + batch process
•
    column-per-app-server
•
    counter API (after 1.0 is out)
Locking
•
    Zookeeper
•
    Cages: http://code.google.com/p/cages/
•
    Not suitable for multi-DC
UUIDs

counter1   672e34a2-ba33...   b681a0b1-58f2...


counter2   3f19757a-c89d...   844e75e2-b546...   a20fcf52-595c...




counter1    aggregated: 27


counter2    aggregated: 42
Column per appserver

counter1   672e34a2-ba33: 12    b681a0b1-58f2: 4   1872c1c2-38f1: 9


counter2   3f19757a-c89d: 7    844e75e2-b546: 11
Counter API

 key   counter1: (14, 13, 9)   counter2: (11, 15, 17)
General Tips
●
    Start with queries, work backwards
●
    Avoid storing extra “timestamp” columns
●
    Insert instead of check-then-insert
●
    Use client-side clock to your advantage
●
    use TTL
●
    Learn to love wide rows
Cassandra Tutorial

More Related Content

What's hot (20)

PDF
Introduction to Cassandra
Gokhan Atil
 
PDF
Introduction to Cassandra
SoftwareMill
 
PPTX
An Overview of Apache Cassandra
DataStax
 
PPT
NOSQL Database: Apache Cassandra
Folio3 Software
 
PPTX
Cassandra concepts, patterns and anti-patterns
Dave Gardner
 
ODP
Intro to cassandra
Aaron Ploetz
 
PDF
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
PDF
Apache Cassandra overview
ElifTech
 
PPTX
Apache Cassandra 2.0
Joe Stein
 
PDF
Cassandra Introduction & Features
Phil Peace
 
PDF
Introduction to Cassandra Basics
nickmbailey
 
PDF
Cassandra multi-datacenter operations essentials
Julien Anguenot
 
KEY
Introduction to Cassandra: Replication and Consistency
Benjamin Black
 
PDF
Outside The Box With Apache Cassnadra
Eric Evans
 
PPTX
Managing Objects and Data in Apache Cassandra
DataStax
 
PPTX
Learn Cassandra at edureka!
Edureka!
 
PPTX
Learning Cassandra
Dave Gardner
 
PDF
Distribute Key Value Store
Santal Li
 
PDF
Understanding Data Partitioning and Replication in Apache Cassandra
DataStax
 
PDF
Apache cassandra architecture internals
Bhuvan Rawal
 
Introduction to Cassandra
Gokhan Atil
 
Introduction to Cassandra
SoftwareMill
 
An Overview of Apache Cassandra
DataStax
 
NOSQL Database: Apache Cassandra
Folio3 Software
 
Cassandra concepts, patterns and anti-patterns
Dave Gardner
 
Intro to cassandra
Aaron Ploetz
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
DataStax
 
Apache Cassandra overview
ElifTech
 
Apache Cassandra 2.0
Joe Stein
 
Cassandra Introduction & Features
Phil Peace
 
Introduction to Cassandra Basics
nickmbailey
 
Cassandra multi-datacenter operations essentials
Julien Anguenot
 
Introduction to Cassandra: Replication and Consistency
Benjamin Black
 
Outside The Box With Apache Cassnadra
Eric Evans
 
Managing Objects and Data in Apache Cassandra
DataStax
 
Learn Cassandra at edureka!
Edureka!
 
Learning Cassandra
Dave Gardner
 
Distribute Key Value Store
Santal Li
 
Understanding Data Partitioning and Replication in Apache Cassandra
DataStax
 
Apache cassandra architecture internals
Bhuvan Rawal
 

Similar to Cassandra Tutorial (20)

PDF
Ben Coverston - The Apache Cassandra Project
Morningstar Tech Talks
 
PDF
Alternator webinar september 2019
Nadav Har'El
 
PDF
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
ScyllaDB
 
PPT
Scaling web applications with cassandra presentation
Murat Çakal
 
PDF
Slide presentation pycassa_upload
Rajini Ramesh
 
PPTX
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Kinetica
 
PPTX
Couchbase Data Platform | Big Data Demystified
Omid Vahdaty
 
PDF
Couchbas for dummies
Qureshi Tehmina
 
PDF
Jan 2015 - Cassandra101 Manchester Meetup
Christopher Batey
 
PDF
Renegotiating the boundary between database latency and consistency
ScyllaDB
 
PPTX
Exploring KSQL Patterns
confluent
 
PDF
Cassandra Day Chicago 2015: Building Java Applications with Apache Cassandra
DataStax Academy
 
PDF
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
KEY
Adding Riak to your NoSQL Bag of Tricks
siculars
 
PPTX
Data stores: beyond relational databases
Javier García Magna
 
PDF
Introduction to Cassandra
Hanborq Inc.
 
PPTX
Master tuning
Thomas Kejser
 
PDF
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Tim Callaghan
 
PDF
Deep Dive into Cassandra
Brent Theisen
 
Ben Coverston - The Apache Cassandra Project
Morningstar Tech Talks
 
Alternator webinar september 2019
Nadav Har'El
 
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
ScyllaDB
 
Scaling web applications with cassandra presentation
Murat Çakal
 
Slide presentation pycassa_upload
Rajini Ramesh
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Kinetica
 
Couchbase Data Platform | Big Data Demystified
Omid Vahdaty
 
Couchbas for dummies
Qureshi Tehmina
 
Jan 2015 - Cassandra101 Manchester Meetup
Christopher Batey
 
Renegotiating the boundary between database latency and consistency
ScyllaDB
 
Exploring KSQL Patterns
confluent
 
Cassandra Day Chicago 2015: Building Java Applications with Apache Cassandra
DataStax Academy
 
MySQL Cluster Scaling to a Billion Queries
Bernd Ocklin
 
Adding Riak to your NoSQL Bag of Tricks
siculars
 
Data stores: beyond relational databases
Javier García Magna
 
Introduction to Cassandra
Hanborq Inc.
 
Master tuning
Thomas Kejser
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Tim Callaghan
 
Deep Dive into Cassandra
Brent Theisen
 
Ad

Recently uploaded (20)

PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
July Patch Tuesday
Ivanti
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Ad

Cassandra Tutorial

  • 1. Apache Cassandra in Action Jonathan Ellis [email protected] / @spyced
  • 2. Why Cassandra? • Relational databases are not designed to scale • B-trees are slow – and require read-before-write
  • 9. (“The eBay Architecture,” Randy Shoup and Dan Pritchett)
  • 14. Reader Memtable Writer Commitlog The Log-Structured Merge-Tree, Bigtable: A Distributed Storage System for Structured Data
  • 15. Dynamo, 2007 Bigtable, 2006 OSS, 2008 Incubator, 2009 TLP, 2010
  • 16. Cassandra in production • Digital Reasoning: NLP + entity analytics • OpenWave: enterprise messaging • OpenX: largest publisher-side ad network in the world • Cloudkick: performance data & aggregation • SimpleGEO: location-as-API • Ooyala: video analytics and business intelligence • ngmoco: massively multiplayer game worlds
  • 17. FUD? • “Cassandra is only appropriate for unimportant data.”
  • 18. Durabilty • Write to commitlog – fsync is cheap since it’s append-only • Write to memtable • [amortized] flush memtable to sstable
  • 19. SSTable format, briefly <key 127> <key 255> <row data 0> ... <row data 1> ... <row data 127> ... <row data 255> ... Sorted [clustered] by row key
  • 21. W A T L
  • 22. W A F T L
  • 23. W A F (A-L] T L
  • 24. W A (A-F] F T (F-L] L
  • 25. Key “C” W A F T L
  • 26. Reliability • No single points of failure • Multiple datacenters • Monitorable
  • 27. Some headlines • “Resyncing Broken MySQL Replication” • “How To Repair MySQL Replication” • “Fixing Broken MySQL Database Replication” • “Replication on Linux broken after db restore” • “MySQL :: Repairing broken replication”
  • 30. Good architecture solves multiple problems at once • Availability in single datacenter • Availability in multiple datacenters
  • 31. Y Key “C” A W U F T L P
  • 32. Y Key “C” A W U F X T hint L P
  • 33. Y A W U F T L P
  • 35. Y Key “C” A W U F T L P
  • 36. Y Key “C” A W U F T L P
  • 37. Tuneable consistency • ONE, QUORUM, ALL • R+W>N • Choose availability vs consistency (and latency)
  • 39. JMX
  • 41. When do you need Cassandra? • Ian Eure: “If you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL data store”
  • 42. Not Only SQL • Curt Monash: “ACID-compliant transaction integrity commonly costs more in terms of DBMS licenses and many other components of TCO (Total Cost of Ownership) than [scalable NoSQL]. Worse, it can actually hurt application uptime, by forcing your system to pull in its horns and stop functioning in the face of failures that a non-transactional system might smoothly work around. Other flavors of “complexity can be a bad thing” apply as well. Thus, transaction integrity can be more trouble than it’s worth.” [Curt’s emphasis]
  • 44. Keyspaces & ColumnFamilies • Conceptually, like “schemas” and “tables”
  • 45. Inside CFs, columns are dynamic • Twitter: “Fifteen months ago, it took two weeks to perform ALTER TABLE on the statuses [tweets] table.”
  • 46. ColumnFamilies • Static – Object data • Dynamic – Precalculated query results
  • 47. “static” columnfamilies Users zznate Password: * Name: Nate driftx Password: * Name: Brandon thobbs Password: * Name: Tyler jbellis Password: * Name: Jonathan Site: riptano.com
  • 48. “dynamic” columnfamilies Following zznate driftx: thobbs: driftx thobbs zznate: jbellis driftx: mdennis: pcmanus thobbs: xedin: zznate
  • 49. Inserting • Really “insert or update” • Not a key/value store – update as much of the row as you want
  • 50. Example: twissandra • http://twissandra.com
  • 51. CREATE TABLE users ( id INTEGER PRIMARY KEY, username VARCHAR(64), password VARCHAR(64) ); CREATE TABLE following ( user INTEGER REFERENCES user(id), followed INTEGER REFERENCES user(id) ); CREATE TABLE tweets ( id INTEGER, user INTEGER REFERENCES user(id), body VARCHAR(140), timestamp TIMESTAMP );
  • 52. Cassandrified create column family users with comparator = UTF8Type and column_metadata = [{column_name: password, validation_class: UTF8Type}] create column family tweets with comparator = UTF8Type and column_metadata = [{column_name: body, validation_class: UTF8Type}, {column_name: username, validation_class: UTF8Type}] create column family friends with comparator = UTF8Type create column family followers with comparator = UTF8Type create column family userline with comparator = LongType and default_validation_class = UUIDType create column family timeline with comparator = LongType and default_validation_class = UUIDType
  • 54. User RowKey: ericflo => (column=password, value=****, timestamp=1289446382541473) ------------------- RowKey: jbellis => (column=password, value=****, timestamp=1289446438490709) uname = 'jericevans' password = '**********' columns = {'password': password} USER.insert(uname, columns)
  • 55. Natural keys vs surrogate
  • 56. Friends and Followers RowKey: ericflo => (column=jbellis, value=1289446467611029, timestamp=1289446467611064) => (column=b6n, value=1289446467611031, timestamp=1289446467611080) to_uname = 'ericflo' FRIENDS.insert(uname, {to_uname: time.time()}) FOLLOWERS.insert(to_uname, {uname: time.time()})
  • 57. zznate driftx: thobbs: driftx thobbs zznate: jbellis driftx: mdenni pcmanu thobbs: xedin: zznat s: s: e:
  • 58. Tweets RowKey: 92dbeb50-ed45-11df-a6d0-000c29864c4f => (column=body, value=Four score and seven years ago, timestamp=1289446891681799) => (column=username, value=alincoln, timestamp=1289446891681799) ------------------- RowKey: d418a66e-edc5-11df-ae6c-000c29864c4f => (column=body, value=Do geese see God?, timestamp=1289501976713199) => (column=username, value=pdrome, timestamp=1289501976713199)
  • 59. Userline RowKey: ericflo => (column=1289446393708810, value=6a0b4834-ed44-11df- bc31-000c29864c4f, timestamp=1289446393710212) => (column=1289446397693831, value=6c6b5916-ed44-11df- bc31-000c29864c4f, timestamp=1289446397694646) => (column=1289446891681780, value=92dbeb50-ed45-11df- a6d0-000c29864c4f, timestamp=1289446891685065) => (column=1289446897315887, value=96379f92-ed45-11df- a6d0-000c29864c4f, timestamp=1289446897317676)
  • 60. Userline zznate 1289847840615: 3f19757a-c89d... 1289847887086: a20fcf52-595c... driftx thobbs 1289847887086: a20fcf52-595c... jbellis 1289847840615: 3f19757a-c89d... 128984784425: 844e75e2-b546...
  • 62. Timeline RowKey: ericflo => (column=1289446393708810, value=6a0b4834-ed44-11df- bc31-000c29864c4f, timestamp=1289446393710212) => (column=1289446397693831, value=6c6b5916-ed44-11df- bc31-000c29864c4f, timestamp=1289446397694646) => (column=1289446891681780, value=92dbeb50-ed45-11df- a6d0-000c29864c4f, timestamp=1289446891685065) => (column=1289446897315887, value=96379f92-ed45-11df- a6d0-000c29864c4f, timestamp=1289446897317676)
  • 63. Adding a tweet tweet_id = str(uuid()) body = '@ericflo thanks for Twissandra, it helps!' timestamp = long(time.time() * 1e6) columns = {'uname': useruuid, 'body': body} TWEET.insert(tweet_id, columns) columns = {ts: tweet_id} USERLINE.insert(uname, columns) TIMELINE.insert(uname, columns) for follower_uname in FOLLOWERS.get(uname, 5000): TIMELINE.insert(follower_uname, columns)
  • 64. Reads timeline = USERLINE.get(uname, column_reversed=True) tweets = TWEET.multiget(timeline.values()) start = request.GET.get('start') limit = NUM_PER_PAGE timeline = TIMELINE.get(uname, column_start=start, column_count=limit, column_reversed=True) tweets = TWEET.multiget(timeline.values())
  • 65. Programatically • Don't use thrift directly • Higher level clients have a lot of features you want – Knowledge about data types – Connection pooling – Automatic retries – Logging
  • 66. Raw thrift API: Connecting def get_client(host='127.0.0.1', port=9170): socket = TSocket.TSocket(host, port) transport = TTransport.TBufferedTransport(socket) transport.open() protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport) client = Cassandra.Client(protocol) return client
  • 67. Raw thrift API: Inserting data = {'id': useruuid, ...} columns = [Column(k, v, time.time()) for (k, v) in data.items()] mutations = [Mutation(ColumnOrSuperColumn(column=c)) for c in columns] rows = {useruuid: {'User': mutations}} client.batch_mutate('Twissandra', rows, ConsistencyLevel.ONE)
  • 68. API layers • libpq • Thrift • JDBC • Hector • JPA • Hector object- mapper
  • 69. Running twissandra • Login: notroot/notroot – (root/riptano) • cd twissandra • python manage.py runserver & • Navigate to http://127.0.0.1:8000 • Login as jim/jim, tom/tom, or create your own
  • 70. One more thing • !PUBLIC! userline
  • 71. Exercise 1 • $ cassandra-cli --host localhost • ] use twissandra; ] help; ] help list; ] help get; ] help del; • Delete the most recent tweet – How would you find this w/o looking at the UI?
  • 72. Exercise 2 • User jim is following user tom, but twissandra doesn't populate Timeline with tweets from before the follow action. • Insert a tweet from tom before the follow action into jim's timeline
  • 74. Exercise 3 • Add a state column to the Tweet column family definition, with an index (index_type KEYS). – Hint: a no-op update column family on Tweet would be update column family Tweet with column_metadata=[{column_name:body, validation_class:UTF8Type}, {column_name:username, validation_class:UTF8Type}] • Set the state column on several tweets to TX. Select them using get … where.
  • 75. Language support • Python – pycassa – telephus • Ruby – Speed is a negative • Java – Hector • PHP – phpcassa
  • 76. Done yet? • Still doing 1+N queries per page • Solution: Supercolumns
  • 77. Applying SuperColumns to Twissandra jbellis 1289847840615 1289847844275 1289847844275 1289847887086 1289847844275 Id: Id: Id: Id: Id: Id: 3f19757a-c89d... 3f19757a- 844e75e2-b546... 3f19757a- a20fcf52-595c... 3f19757a- c89d... c89d... c89d... uname: uname: uname: uname: uname: uname: zznate zznate driftx zznate zznate zznate body: body: body: body: body: body: O Do geese see stone be not so Rise geese see Do to vote sir Do Igeese see prefer pi ... ... ...
  • 78. Supercolumns: limitations • Requires reading an entire SC (not the entire row) from disk even if you just want one subcolumn
  • 79. UUIDs • Column names should be uuids, not longs, to avoid collisions • Version 1 UUIDs can be sorted by time (“TimeUUID”) • Any UUID can be sorted by its raw bytes (“LexicalUUID”) – Usually Version 4 – Slightly less overhead
  • 80. Lucandra • What documents contain term X? – … and term Y? – … or start with Z?
  • 81. Fields and Terms <doc> <field name=”title”>apache talk</field> <field name=”date”>20110201</field> </doc> feld term freq position title apache 1 0 title talk 1 1 date 20110201 1 0
  • 82. Lucandra ColumnFamilies create column family documents with comparator = BytesType; Create column family terminfo with column_type = Super and comparator = BytesType and subcomparator = BytesType;
  • 83. Lucandra data Document Key col name value "documentId" => { fieldName , value } Term Key col name value "field/term" => { documentId , position vector }
  • 84. Lucandra queries • get_slice • get_range_slices • No silver bullet
  • 85. FAQ: counting • UUIDs + batch process • column-per-app-server • counter API (after 1.0 is out)
  • 86. Locking • Zookeeper • Cages: http://code.google.com/p/cages/ • Not suitable for multi-DC
  • 87. UUIDs counter1 672e34a2-ba33... b681a0b1-58f2... counter2 3f19757a-c89d... 844e75e2-b546... a20fcf52-595c... counter1 aggregated: 27 counter2 aggregated: 42
  • 88. Column per appserver counter1 672e34a2-ba33: 12 b681a0b1-58f2: 4 1872c1c2-38f1: 9 counter2 3f19757a-c89d: 7 844e75e2-b546: 11
  • 89. Counter API key counter1: (14, 13, 9) counter2: (11, 15, 17)
  • 90. General Tips ● Start with queries, work backwards ● Avoid storing extra “timestamp” columns ● Insert instead of check-then-insert ● Use client-side clock to your advantage ● use TTL ● Learn to love wide rows