SlideShare a Scribd company logo
V is for vnodes
    Patrick McFadin, Sr Solution Architect
    DataStax


    ©2012 DataStax
                                             1
Friday, February 15, 13
Agenda for today
        • What is a node?
        • How vnodes work
        • Converting your cluster
        • Benefits




    ©2012 DataStax
                                    2
Friday, February 15, 13
Since the beginning...
                     Cassandra has had...



                             Clusters, which have...



                                      Keyspaces, which have...



                                              Column Families, which have...




    ©2012 DataStax
                                                                               3
Friday, February 15, 13
Row Keys

                          Unique in a column family
                          Can be up to 64k in size
                          Can be sorted in the cluster
                                                          Byte Ordered Partitioner

                                    OR...



                          Can be randomly placed in cluster
                                                         Random Partitioner



    ©2012 DataStax
                                                                                     4
Friday, February 15, 13
Row Keys
                  How do you...

                  • Create a random number?
                  • Make sure the number is big enough?
                  • Make it reproducible?


                                 MD5 does the job


                     Input a Row Key     MD5        Get a 128 bit number




    ©2012 DataStax
                                                                           5
Friday, February 15, 13
Row Keys
                           Input                                      Get

                 @PatrickMcFadin             MD5        0xcfc2d0610aaa712a8c36711d08a2550a




                           Input                                      Get

                          8675309            MD5        0x6cc0d36686e6a433aa76f96773852d35




                            The number produced is a range between:

                            0 and 2128-1... but Cassandra uses 2127-1

         2128 = 340,282,366,920,938,463,463,374,607,431,768,211,456

                                            ...otherwise known as a HUGE number.
    ©2012 DataStax
                                                                                             6
Friday, February 15, 13
©2012 DataStax
                          7
Friday, February 15, 13
Token Assignment
        • Each Cassandra node is assigned a token
        • Each token is a number inside the huge range
        • Tokens mark the ownership range of Row Keys

                          From: Token = 0




                                               To: Token = 56713727820156410577229101238628035242




                                                   From:



                            To: Token = 113427455640312821154458202477256070484

    ©2012 DataStax
                                                                                                    8
Friday, February 15, 13
Row Key to Token
                    Input                                                     Get

       @PatrickMcFadin                      MD5          276161727147663567581939045564154008842




                                       Token = 0




                              I’ll                 Token = 56713727820156410577229101238628035242
                            take it!



                            Token = 113427455640312821154458202477256070484

    ©2012 DataStax
                                                                                                    9
Friday, February 15, 13
Row Key to Token
                    Input                                                     Get

       @PatrickMcFadin                      MD5          276161727147663567581939045564154008842




                                       Token = 0




                              I’ll                 Token = 56713727820156410577229101238628035242
                            take it!



                            Token = 113427455640312821154458202477256070484

    ©2012 DataStax
                                                                                                    9
Friday, February 15, 13
Row Key to Token
                    Input                                                     Get

       @PatrickMcFadin                      MD5          276161727147663567581939045564154008842




                                       Token = 0




                              I’ll                 Token = 56713727820156410577229101238628035242
                            take it!



                            Token = 113427455640312821154458202477256070484

    ©2012 DataStax
                                                                                                    9
Friday, February 15, 13
Cassandra 1.1 Node
        • Responsible for a single range of keys
        • Range determined by single token
        • One server = One token = One node




    ©2012 DataStax
                                                   10
Friday, February 15, 13
Cassandra 1.1 Node
        • Responsible for a single range of keys
        • Range determined by single token
        • One server = One token = One node




    ©2012 DataStax
                                                   10
Friday, February 15, 13
Cassandra 1.1 Node
        • Responsible for a single range of keys
        • Range determined by single token
        • One server = One token = One node




          Commodity node?

    ©2012 DataStax
                                                   10
Friday, February 15, 13
Cassandra 1.1 Node
        • Responsible for a single range of keys
        • Range determined by single token
        • One server = One token = One node




          Commodity node?

    ©2012 DataStax
                                                   10
Friday, February 15, 13
Cassandra 1.1 Node
        • Responsible for a single range of keys
        • Range determined by single token
        • One server = One token = One node




          Commodity node?             What you really want.

    ©2012 DataStax
                                                              10
Friday, February 15, 13
Cassandra 1.1 Node
        • Responsible for a single range of keys
        • Range determined by single token
        • One server = One token = One node




          Commodity node?             What you really want.

    ©2012 DataStax
                                                              10
Friday, February 15, 13
Time for a new plan

               • Hardware is only getting bigger
               • One node is responsible for more data
               • Token assignments are a pain




    ©2012 DataStax
                                                         11
Friday, February 15, 13
Token assignment (sucks)
        • Tokens need to be evenly spread
        • Growing a ring... not good options
        • Shrinking a ring... not good options
        • Tokens have to be added to each server config




    ©2012 DataStax
                                                         12
Friday, February 15, 13
Enter Virtual Nodes
        • One server should have many nodes
        • Each node should be small
        • Tokens should be automatic

                  Version 1.1          Version 1.2
                          Server 1       Server 1

                                         1      2

                            1-4
                                         4      3




    ©2012 DataStax
                                                     13
Friday, February 15, 13
Virtual Node Features
        • Default 256 Nodes per server
        • Auto assign tokens
        • Faster rebuilds of servers
        • Faster server add to cluster
        • New partitioner (More later)




    ©2012 DataStax
                                         14
Friday, February 15, 13
Transitioning to vnodes
              Super easy!

              Find these lines in your cassandra.yaml file:

                          #num_tokens:

                          initial_token: <some big number>




             Change to:
                          num_tokens: 256

                          initial_token:




              and restart.
                                                             Repeat on all nodes in cluster
    ©2012 DataStax
                                                                                         15
Friday, February 15, 13
Transitioning to vnodes
           After all Cassandra instances have been reset

                          Initialize a shuffle operation

                          [patrick@cassandra0 ~]$ cassandra-shuffle create




                          Enable shuffling

                          [patrick@cassandra0 ~]$ cassandra-shuffle enable




                          List pending relocations*

                          [patrick@cassandra0 ~]$ cassandra-shuffle ls




                                                                             Let’s walk through it...
                             *This is a slow op. Be patient.
    ©2012 DataStax
                                                                                                    16
Friday, February 15, 13
Existing 1.1 cluster
                          Server 1   Server 2



                            1-4        4-8




                          Server 4   Server 3



                          13-16       9-12




    ©2012 DataStax

Friday, February 15, 13
Set num_tokens and restart
                            Server 1        Server 2

                           1-4     1-4    4-8     4-8



                           1-4     1-4    4-8     4-8




                            Server 4        Server 3

                          13-16   13-16   9-12    9-12



                          13-16   13-16   9-12    9-12



    ©2012 DataStax
                                                         18
Friday, February 15, 13
Set num_tokens and restart
                           Server 1                  Server 2

                          1       2                 5       6



                          3       4                 7       8




                           Server 4                  Server 3

                          13      14                9       10



                          15      16               11       12



    ©2012 DataStax
                                       Initialize and Enable shuffling...
                                                                            19
Friday, February 15, 13
Shuffle enable
                           Server 1    Server 2

                          1       5    2          6



                          13      9    14     10




                           Server 4    Server 3

                          3       7    4          8



                          16      12   15     11



    ©2012 DataStax
                                                      20
Friday, February 15, 13
Shuffle complete
                           Server 1    Server 2

                          1       5    2          6



                          13      9    14     10




                           Server 4    Server 3

                          3       7    4          8



                          16      12   15     11



    ©2012 DataStax
                                                      21
Friday, February 15, 13
Ops life with vnodes
        • Add any number of nodes
        • No token assignments!
        • Bigger server? Larger num_tokens
        • Decommission any number of nodes
        • New nodetool command: status




                          One more time now!

    ©2012 DataStax
                                               22
Friday, February 15, 13
Bonus new thing
        • New Partitioner: Murmur3Partitoner
        • Murmur3 replaces MD5
        • Slightly faster than MD5 in certain cases
        • Go forward partitioner for NEW clusters
        • No need to convert




            More details here:
            https://issues.apache.org/jira/browse/CASSANDRA-3772


    ©2012 DataStax
                                                                   23
Friday, February 15, 13
In conclusion...


                              Go out and try some vnode love today!



                              Download Cassandra 1.2 now


                          http://www.datastax.com/download/community


                            http://cassandra.apache.org/download/




    ©2012 DataStax
                                                                       24
Friday, February 15, 13
Some handy references

              http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2


               http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes


              Follow me on Twitter for more: @PatrickMcFadin




    ©2012 DataStax
                                                                                          25
Friday, February 15, 13
We power the apps
                           that transform
                              business.



    ©2012 DataStax
                                              26
Friday, February 15, 13

More Related Content

What's hot (20)

PPTX
iceberg introduction.pptx
Dori Waldman
 
PPTX
Kusto (Azure Data Explorer) Training for R&D - January 2019
Tal Bar-Zvi
 
PDF
Databricks: A Tool That Empowers You To Do More With Data
Databricks
 
PPTX
Spark and Spark Streaming
宇 傅
 
PPTX
ElasticSearch Basic Introduction
Mayur Rathod
 
PDF
An Introduction to VMware NSX
Scott Lowe
 
PPTX
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
PDF
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
PPTX
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
PDF
How Adobe uses Structured Streaming at Scale
Databricks
 
PDF
Simplifying Big Data Analytics with Apache Spark
Databricks
 
PPTX
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
PPT
OpenSearch
hchen1
 
PPTX
Big Data Analytics with Hadoop
Philippe Julio
 
PDF
Cassandra Database
YounesCharfaoui
 
PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
PDF
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Amazon Web Services Korea
 
PPTX
Azure data platform overview
James Serra
 
PDF
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Cathrine Wilhelmsen
 
PDF
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Artem Chebotko
 
iceberg introduction.pptx
Dori Waldman
 
Kusto (Azure Data Explorer) Training for R&D - January 2019
Tal Bar-Zvi
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks
 
Spark and Spark Streaming
宇 傅
 
ElasticSearch Basic Introduction
Mayur Rathod
 
An Introduction to VMware NSX
Scott Lowe
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
Apache Spark Architecture | Apache Spark Architecture Explained | Apache Spar...
Simplilearn
 
How Adobe uses Structured Streaming at Scale
Databricks
 
Simplifying Big Data Analytics with Apache Spark
Databricks
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
 
OpenSearch
hchen1
 
Big Data Analytics with Hadoop
Philippe Julio
 
Cassandra Database
YounesCharfaoui
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Amazon Web Services Korea
 
Azure data platform overview
James Serra
 
Lessons Learned: Understanding Pipeline Pricing in Azure Data Factory and Azu...
Cathrine Wilhelmsen
 
Using the Chebotko Method to Design Sound and Scalable Data Models for Apache...
Artem Chebotko
 

Viewers also liked (20)

PDF
Cassandra overview: Um Caso Prático
Eiti Kimura
 
PDF
Bulk Loading Data into Cassandra
DataStax
 
PPTX
Webinar | Introducing DataStax Enterprise 4.6
DataStax
 
PPTX
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
PPTX
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
 
PPTX
Cassandra Community Webinar: Back to Basics with CQL3
DataStax
 
PPTX
Webinar: Don't Leave Your Data in the Dark
DataStax
 
PPTX
How much money do you lose every time your ecommerce site goes down?
DataStax
 
PDF
Cassandra Community Webinar | In Case of Emergency Break Glass
DataStax
 
PPTX
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
DataStax
 
PPTX
Getting Big Value from Big Data
DataStax
 
PPTX
Webinar: Eventual Consistency != Hopeful Consistency
DataStax
 
PDF
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
DataStax
 
PPT
Webinar: Getting Started with Apache Cassandra
DataStax
 
PPTX
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
DataStax
 
PDF
Cassandra TK 2014 - Large Nodes
aaronmorton
 
PDF
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
DataStax
 
PPT
Webinar: 2 Billion Data Points Each Day
DataStax
 
PPTX
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
DataStax
 
PPTX
Cassandra Community Webinar | Make Life Easier - An Introduction to Cassandra...
DataStax
 
Cassandra overview: Um Caso Prático
Eiti Kimura
 
Bulk Loading Data into Cassandra
DataStax
 
Webinar | Introducing DataStax Enterprise 4.6
DataStax
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
DataStax
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
DataStax
 
Cassandra Community Webinar: Back to Basics with CQL3
DataStax
 
Webinar: Don't Leave Your Data in the Dark
DataStax
 
How much money do you lose every time your ecommerce site goes down?
DataStax
 
Cassandra Community Webinar | In Case of Emergency Break Glass
DataStax
 
Don't Let Your Shoppers Drop; 5 Rules for Today’s eCommerce
DataStax
 
Getting Big Value from Big Data
DataStax
 
Webinar: Eventual Consistency != Hopeful Consistency
DataStax
 
Webinar | How Clear Capital Delivers Always-on Appraisals on 122 Million Prop...
DataStax
 
Webinar: Getting Started with Apache Cassandra
DataStax
 
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
DataStax
 
Cassandra TK 2014 - Large Nodes
aaronmorton
 
Cassandra Community Webinar | Practice Makes Perfect: Extreme Cassandra Optim...
DataStax
 
Webinar: 2 Billion Data Points Each Day
DataStax
 
Don’t Get Caught in a PCI Pickle: Meet Compliance and Protect Payment Card Da...
DataStax
 
Cassandra Community Webinar | Make Life Easier - An Introduction to Cassandra...
DataStax
 
Ad

More from Patrick McFadin (20)

PDF
Successful Architectures for Fast Data
Patrick McFadin
 
PDF
Open source or proprietary, choose wisely!
Patrick McFadin
 
PDF
An Introduction to time series with Team Apache
Patrick McFadin
 
PDF
Laying down the smack on your data pipelines
Patrick McFadin
 
PDF
Help! I want to contribute to an Open Source project but my boss says no.
Patrick McFadin
 
PDF
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
PDF
Storing time series data with Apache Cassandra
Patrick McFadin
 
PDF
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Patrick McFadin
 
PDF
Cassandra 3.0 advanced preview
Patrick McFadin
 
PDF
Advanced data modeling with apache cassandra
Patrick McFadin
 
PDF
Introduction to data modeling with apache cassandra
Patrick McFadin
 
PDF
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 
PDF
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
PDF
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
PDF
Apache cassandra & apache spark for time series data
Patrick McFadin
 
PDF
Real data models of silicon valley
Patrick McFadin
 
PDF
Introduction to cassandra 2014
Patrick McFadin
 
PDF
Making money with open source and not losing your soul: A practical guide
Patrick McFadin
 
PDF
Time series with Apache Cassandra - Long version
Patrick McFadin
 
PDF
Time series with apache cassandra strata
Patrick McFadin
 
Successful Architectures for Fast Data
Patrick McFadin
 
Open source or proprietary, choose wisely!
Patrick McFadin
 
An Introduction to time series with Team Apache
Patrick McFadin
 
Laying down the smack on your data pipelines
Patrick McFadin
 
Help! I want to contribute to an Open Source project but my boss says no.
Patrick McFadin
 
Analyzing Time Series Data with Apache Spark and Cassandra
Patrick McFadin
 
Storing time series data with Apache Cassandra
Patrick McFadin
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
Patrick McFadin
 
Cassandra 3.0 advanced preview
Patrick McFadin
 
Advanced data modeling with apache cassandra
Patrick McFadin
 
Introduction to data modeling with apache cassandra
Patrick McFadin
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Patrick McFadin
 
Owning time series with team apache Strata San Jose 2015
Patrick McFadin
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
Apache cassandra & apache spark for time series data
Patrick McFadin
 
Real data models of silicon valley
Patrick McFadin
 
Introduction to cassandra 2014
Patrick McFadin
 
Making money with open source and not losing your soul: A practical guide
Patrick McFadin
 
Time series with Apache Cassandra - Long version
Patrick McFadin
 
Time series with apache cassandra strata
Patrick McFadin
 
Ad

Recently uploaded (20)

PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Python basic programing language for automation
DanialHabibi2
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 

Cassandra Virtual Node talk

  • 1. V is for vnodes Patrick McFadin, Sr Solution Architect DataStax ©2012 DataStax 1 Friday, February 15, 13
  • 2. Agenda for today • What is a node? • How vnodes work • Converting your cluster • Benefits ©2012 DataStax 2 Friday, February 15, 13
  • 3. Since the beginning... Cassandra has had... Clusters, which have... Keyspaces, which have... Column Families, which have... ©2012 DataStax 3 Friday, February 15, 13
  • 4. Row Keys Unique in a column family Can be up to 64k in size Can be sorted in the cluster Byte Ordered Partitioner OR... Can be randomly placed in cluster Random Partitioner ©2012 DataStax 4 Friday, February 15, 13
  • 5. Row Keys How do you... • Create a random number? • Make sure the number is big enough? • Make it reproducible? MD5 does the job Input a Row Key MD5 Get a 128 bit number ©2012 DataStax 5 Friday, February 15, 13
  • 6. Row Keys Input Get @PatrickMcFadin MD5 0xcfc2d0610aaa712a8c36711d08a2550a Input Get 8675309 MD5 0x6cc0d36686e6a433aa76f96773852d35 The number produced is a range between: 0 and 2128-1... but Cassandra uses 2127-1 2128 = 340,282,366,920,938,463,463,374,607,431,768,211,456 ...otherwise known as a HUGE number. ©2012 DataStax 6 Friday, February 15, 13
  • 7. ©2012 DataStax 7 Friday, February 15, 13
  • 8. Token Assignment • Each Cassandra node is assigned a token • Each token is a number inside the huge range • Tokens mark the ownership range of Row Keys From: Token = 0 To: Token = 56713727820156410577229101238628035242 From: To: Token = 113427455640312821154458202477256070484 ©2012 DataStax 8 Friday, February 15, 13
  • 9. Row Key to Token Input Get @PatrickMcFadin MD5 276161727147663567581939045564154008842 Token = 0 I’ll Token = 56713727820156410577229101238628035242 take it! Token = 113427455640312821154458202477256070484 ©2012 DataStax 9 Friday, February 15, 13
  • 10. Row Key to Token Input Get @PatrickMcFadin MD5 276161727147663567581939045564154008842 Token = 0 I’ll Token = 56713727820156410577229101238628035242 take it! Token = 113427455640312821154458202477256070484 ©2012 DataStax 9 Friday, February 15, 13
  • 11. Row Key to Token Input Get @PatrickMcFadin MD5 276161727147663567581939045564154008842 Token = 0 I’ll Token = 56713727820156410577229101238628035242 take it! Token = 113427455640312821154458202477256070484 ©2012 DataStax 9 Friday, February 15, 13
  • 12. Cassandra 1.1 Node • Responsible for a single range of keys • Range determined by single token • One server = One token = One node ©2012 DataStax 10 Friday, February 15, 13
  • 13. Cassandra 1.1 Node • Responsible for a single range of keys • Range determined by single token • One server = One token = One node ©2012 DataStax 10 Friday, February 15, 13
  • 14. Cassandra 1.1 Node • Responsible for a single range of keys • Range determined by single token • One server = One token = One node Commodity node? ©2012 DataStax 10 Friday, February 15, 13
  • 15. Cassandra 1.1 Node • Responsible for a single range of keys • Range determined by single token • One server = One token = One node Commodity node? ©2012 DataStax 10 Friday, February 15, 13
  • 16. Cassandra 1.1 Node • Responsible for a single range of keys • Range determined by single token • One server = One token = One node Commodity node? What you really want. ©2012 DataStax 10 Friday, February 15, 13
  • 17. Cassandra 1.1 Node • Responsible for a single range of keys • Range determined by single token • One server = One token = One node Commodity node? What you really want. ©2012 DataStax 10 Friday, February 15, 13
  • 18. Time for a new plan • Hardware is only getting bigger • One node is responsible for more data • Token assignments are a pain ©2012 DataStax 11 Friday, February 15, 13
  • 19. Token assignment (sucks) • Tokens need to be evenly spread • Growing a ring... not good options • Shrinking a ring... not good options • Tokens have to be added to each server config ©2012 DataStax 12 Friday, February 15, 13
  • 20. Enter Virtual Nodes • One server should have many nodes • Each node should be small • Tokens should be automatic Version 1.1 Version 1.2 Server 1 Server 1 1 2 1-4 4 3 ©2012 DataStax 13 Friday, February 15, 13
  • 21. Virtual Node Features • Default 256 Nodes per server • Auto assign tokens • Faster rebuilds of servers • Faster server add to cluster • New partitioner (More later) ©2012 DataStax 14 Friday, February 15, 13
  • 22. Transitioning to vnodes Super easy! Find these lines in your cassandra.yaml file: #num_tokens: initial_token: <some big number> Change to: num_tokens: 256 initial_token: and restart. Repeat on all nodes in cluster ©2012 DataStax 15 Friday, February 15, 13
  • 23. Transitioning to vnodes After all Cassandra instances have been reset Initialize a shuffle operation [patrick@cassandra0 ~]$ cassandra-shuffle create Enable shuffling [patrick@cassandra0 ~]$ cassandra-shuffle enable List pending relocations* [patrick@cassandra0 ~]$ cassandra-shuffle ls Let’s walk through it... *This is a slow op. Be patient. ©2012 DataStax 16 Friday, February 15, 13
  • 24. Existing 1.1 cluster Server 1 Server 2 1-4 4-8 Server 4 Server 3 13-16 9-12 ©2012 DataStax Friday, February 15, 13
  • 25. Set num_tokens and restart Server 1 Server 2 1-4 1-4 4-8 4-8 1-4 1-4 4-8 4-8 Server 4 Server 3 13-16 13-16 9-12 9-12 13-16 13-16 9-12 9-12 ©2012 DataStax 18 Friday, February 15, 13
  • 26. Set num_tokens and restart Server 1 Server 2 1 2 5 6 3 4 7 8 Server 4 Server 3 13 14 9 10 15 16 11 12 ©2012 DataStax Initialize and Enable shuffling... 19 Friday, February 15, 13
  • 27. Shuffle enable Server 1 Server 2 1 5 2 6 13 9 14 10 Server 4 Server 3 3 7 4 8 16 12 15 11 ©2012 DataStax 20 Friday, February 15, 13
  • 28. Shuffle complete Server 1 Server 2 1 5 2 6 13 9 14 10 Server 4 Server 3 3 7 4 8 16 12 15 11 ©2012 DataStax 21 Friday, February 15, 13
  • 29. Ops life with vnodes • Add any number of nodes • No token assignments! • Bigger server? Larger num_tokens • Decommission any number of nodes • New nodetool command: status One more time now! ©2012 DataStax 22 Friday, February 15, 13
  • 30. Bonus new thing • New Partitioner: Murmur3Partitoner • Murmur3 replaces MD5 • Slightly faster than MD5 in certain cases • Go forward partitioner for NEW clusters • No need to convert More details here: https://issues.apache.org/jira/browse/CASSANDRA-3772 ©2012 DataStax 23 Friday, February 15, 13
  • 31. In conclusion... Go out and try some vnode love today! Download Cassandra 1.2 now http://www.datastax.com/download/community http://cassandra.apache.org/download/ ©2012 DataStax 24 Friday, February 15, 13
  • 32. Some handy references http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes Follow me on Twitter for more: @PatrickMcFadin ©2012 DataStax 25 Friday, February 15, 13
  • 33. We power the apps that transform business. ©2012 DataStax 26 Friday, February 15, 13