Is NoSQL the Future of Data
         Storage?
        By Gary Short
      Developer Express
Introduction
•   Gary Short
•   Technical Evangelist for Developer Express
•   C# MVP
•   garys@devexpress.com
•   www.garyshort.org
•   @garyshort.



                                                 2
What About You Guys?




                       3
Breadth First Look @ NoSQL




                             4
Be Doing 3 Things
1. Define NoSQL databases
2. Look at scenarios where you can use NoSQL
3. Drill into a specific use case.




                                               5
6
Where Does NoSQL Originate?
• 1998
  – OS relational database
     •   Created by Carlo Strozzi
     •   Didn’t expose an SQL interface
     •   Called NoSQL
     •   The author said:
     •   “departs from the relational model altogether...”
     •   “...should have been called ‘NoREL”.



                                                             7
More Recently...
• Eric Evans reintroduced the term in 2009
  – Johan Oskarsson (last.fm)
     • Event to discuss OS distributed databases
• This labels growing number datastores
  – Open source
  – Non-relational
  – Distributed
  – (often) don’t guarantee ACID.

                                                   8
Atlanta 2009
• No:sql(east) conference
• Billed as “conference of no-rel datastores”
• Worst tag line ever
  – SELECT fun, profit FROM real_world WHERE rel=false.




                                                          9
Not Ant-RDBMS




                10
Let’s Talk a Bit About What NoSQL DBs
               Look Like...




                                    11
Key Attributes of NoSQL Databases
•   Don’t require fixed table schemas
•   Non-relational
•   (Usually) avoid join operations
•   Scale horizontally
    – Adding more nodes to a storage system.




                                               12
What Does the Taxonomy Look Like?




                                    13
Document Store
•   RavenDB
•   Apache Jackrabbit
•   CouchDB
•   MongoDB
•   SimpleDB
•   XML Databases
    – MarkLogic Server
    – eXist.

                                14
Document What?




                 15
Graph Storage
•   Trinity
•   AllegroGraph
•   Core Data
•   Neo4j
•   DEX
•   FlockDB.



                               16
Which Means?
• Graph consists of
  – Node (‘stations’ of the graph)
  – Edges (lines between them)
• FlockDB
  – Created by the Twitter folks
  – Nodes = Users
  – Edges = Nature of relationship between nodes.


                                                    17
Social Graph




               18
Key/Value Stores
• On disk
• Cache in Ram
• Eventually Consistent
   – Weak Definition
      • “If no updates occur for a period, eventually all updates will
        propagate through the system and all replicas will be consistent”
   – Strong Definition
      • “for a given update and a given replica eventually either the
        update reaches the replica or the replica retires”
• Ordered
   – Distributed Hash Table allows lexicographical processing.

                                                                            19
Object Databases
•   Db4o
•   GemStone/S
•   InterSystems Caché
•   Objectivity/DB
•   ZODB.




                                20
How the &*$% do You Index
         That?!




                            21
Okay got it, Now Let’s Compare Some
       Real World Scenarios




                                  22
You Need Constant Consistency
•   You’re dealing with financial transactions
•   You’re dealing with medical records
•   You’re dealing with bonded goods
•   Best you use a RDMBS ☺.




                                                 23
You Need Horizontal Scalability
• You’re working across defined geographic regions
• You’re working with large quantities of data
• Game server sharding
• Use NoSQL
   – Something like Cassandra.




                                                     24
Up in the Clouds Baby




                        25
26
Frequently Written Rarely Read
•   Think web counters and the like
•   Every time a user comes to a page = ctr++
•   But it’s only read when the report is run
•   Use NoSQL (key-value storage/memcache).




                                                27
I Got Big Data!




                  28
Binary Baby!
•   If you are YouTube
•   Flickr
•   Twitpic
•   Spotify
•   NoSQL (Amazon S3).




                              29
Here Today Gone Tomorrow
• Transient data like..
  – Web Sessions
  – Locks
  – Short Term Stats
     • Shopping cart contents
• Use NoSQL (Memcache).



                                30
Data Replication
• Same data in two or more locations
  – Music Library
     • Web browser
     • iPone App
• NoSQL (CouchDB).




                                       31
Hit me Baby One More Time!
• High Availability
  – High number of important transactions
     • Online gambling
     • Pay Per view
        – Ahem!
     • Online Auction
• NoSQL (Cassandra – automatic clustering).



                                              32
Give me a Real World Example
• Twitter
  – The challenges
     • Needs to store many graphs
        – Who you are following
        – Who’s following you
        – Who you receive phone notifications from etc
     • To deliver a tweet requires rapid paging of followers
     • Heavy write load as followers are added and removed
     • Set arithmetic for @mentions (intersection of users).


                                                               33
What Did They Try?
• Relational Databases
• Key-Value storage of denormalized lists




                                            34
Did it Work?




               35
What Did They Need?
• Simplest possible thing that would work
• Allow for horizontal partitioning
• Allow write operations to
  – Arrive out of order
  – Or be processed more than once
• Failures should result in redundant work
  – Not lost work!


                                             36
The Result was FlockDB
• Stores graph data
• Not optimised for graph traversal operations
• Optimised for large adjacency lists
  – List of all edges in a graph
     • Each entry is a set of end points (or tuple if directed)
• Optimised for fast read and write
• Optimised for page-able set arithmetic.


                                                                  37
How Does it Work?
• Stores graphs as sets of edges between nodes
• Data is partitioned by node
  – All queries can be answered by a single partition
• Write operations are idempotent
  – Can be applied multiple times without changing
    the result
• And commutative
  – Changing the order of operands doesn’t change
    the result.

                                                        38
A Little More About Idempotency
• Applied several times with no change to the
  result
• A operation ’O’ on set S is called idempotent
  if, for all x in S, x O x = x.
• Set union
  – A U B = {X: X E A or X E B}
• Set intersection
  – A n B = {X: X E A and X E B}

                                                  39
A Little More About Commutative
• Changing the order of operands doesn’t
  change the result.
  3+2=5
• Can be combined with idempotency
• Let’s look at the follow command in Twitter
   • Let X = follow person X
   • Let Y = follow person Y
   • Then 3X + 2Y = 2Y + 3X
   • And 2X + 3Y = 3X + 2Y
• Note: it’s only true for the same operation.
                                                 40
Commutative Writes Help Bring up
            Partitions
• Partition can receive write traffic immediately
• Receive dump of data in the background
• Live for read as soon as the dump is complete.




                                                41
Performance?
• Currently store 13 billion edges
• 20K writes / second
• 100K reads / second.




                                     42
Punchline?
• Under all the bells and whistles...
  – Its MySQL ☺.




                                        43
So is this the Future?
• Yes!
• And No!




                                 44
What?! How Can That be?!




                           45

Is NoSQL The Future of Data Storage?

  • 1.
    Is NoSQL theFuture of Data Storage? By Gary Short Developer Express
  • 2.
    Introduction • Gary Short • Technical Evangelist for Developer Express • C# MVP • [email protected] • www.garyshort.org • @garyshort. 2
  • 3.
  • 4.
  • 5.
    Be Doing 3Things 1. Define NoSQL databases 2. Look at scenarios where you can use NoSQL 3. Drill into a specific use case. 5
  • 6.
  • 7.
    Where Does NoSQLOriginate? • 1998 – OS relational database • Created by Carlo Strozzi • Didn’t expose an SQL interface • Called NoSQL • The author said: • “departs from the relational model altogether...” • “...should have been called ‘NoREL”. 7
  • 8.
    More Recently... • EricEvans reintroduced the term in 2009 – Johan Oskarsson (last.fm) • Event to discuss OS distributed databases • This labels growing number datastores – Open source – Non-relational – Distributed – (often) don’t guarantee ACID. 8
  • 9.
    Atlanta 2009 • No:sql(east)conference • Billed as “conference of no-rel datastores” • Worst tag line ever – SELECT fun, profit FROM real_world WHERE rel=false. 9
  • 10.
  • 11.
    Let’s Talk aBit About What NoSQL DBs Look Like... 11
  • 12.
    Key Attributes ofNoSQL Databases • Don’t require fixed table schemas • Non-relational • (Usually) avoid join operations • Scale horizontally – Adding more nodes to a storage system. 12
  • 13.
    What Does theTaxonomy Look Like? 13
  • 14.
    Document Store • RavenDB • Apache Jackrabbit • CouchDB • MongoDB • SimpleDB • XML Databases – MarkLogic Server – eXist. 14
  • 15.
  • 16.
    Graph Storage • Trinity • AllegroGraph • Core Data • Neo4j • DEX • FlockDB. 16
  • 17.
    Which Means? • Graphconsists of – Node (‘stations’ of the graph) – Edges (lines between them) • FlockDB – Created by the Twitter folks – Nodes = Users – Edges = Nature of relationship between nodes. 17
  • 18.
  • 19.
    Key/Value Stores • Ondisk • Cache in Ram • Eventually Consistent – Weak Definition • “If no updates occur for a period, eventually all updates will propagate through the system and all replicas will be consistent” – Strong Definition • “for a given update and a given replica eventually either the update reaches the replica or the replica retires” • Ordered – Distributed Hash Table allows lexicographical processing. 19
  • 20.
    Object Databases • Db4o • GemStone/S • InterSystems Caché • Objectivity/DB • ZODB. 20
  • 21.
    How the &*$%do You Index That?! 21
  • 22.
    Okay got it,Now Let’s Compare Some Real World Scenarios 22
  • 23.
    You Need ConstantConsistency • You’re dealing with financial transactions • You’re dealing with medical records • You’re dealing with bonded goods • Best you use a RDMBS ☺. 23
  • 24.
    You Need HorizontalScalability • You’re working across defined geographic regions • You’re working with large quantities of data • Game server sharding • Use NoSQL – Something like Cassandra. 24
  • 25.
    Up in theClouds Baby 25
  • 26.
  • 27.
    Frequently Written RarelyRead • Think web counters and the like • Every time a user comes to a page = ctr++ • But it’s only read when the report is run • Use NoSQL (key-value storage/memcache). 27
  • 28.
    I Got BigData! 28
  • 29.
    Binary Baby! • If you are YouTube • Flickr • Twitpic • Spotify • NoSQL (Amazon S3). 29
  • 30.
    Here Today GoneTomorrow • Transient data like.. – Web Sessions – Locks – Short Term Stats • Shopping cart contents • Use NoSQL (Memcache). 30
  • 31.
    Data Replication • Samedata in two or more locations – Music Library • Web browser • iPone App • NoSQL (CouchDB). 31
  • 32.
    Hit me BabyOne More Time! • High Availability – High number of important transactions • Online gambling • Pay Per view – Ahem! • Online Auction • NoSQL (Cassandra – automatic clustering). 32
  • 33.
    Give me aReal World Example • Twitter – The challenges • Needs to store many graphs – Who you are following – Who’s following you – Who you receive phone notifications from etc • To deliver a tweet requires rapid paging of followers • Heavy write load as followers are added and removed • Set arithmetic for @mentions (intersection of users). 33
  • 34.
    What Did TheyTry? • Relational Databases • Key-Value storage of denormalized lists 34
  • 35.
  • 36.
    What Did TheyNeed? • Simplest possible thing that would work • Allow for horizontal partitioning • Allow write operations to – Arrive out of order – Or be processed more than once • Failures should result in redundant work – Not lost work! 36
  • 37.
    The Result wasFlockDB • Stores graph data • Not optimised for graph traversal operations • Optimised for large adjacency lists – List of all edges in a graph • Each entry is a set of end points (or tuple if directed) • Optimised for fast read and write • Optimised for page-able set arithmetic. 37
  • 38.
    How Does itWork? • Stores graphs as sets of edges between nodes • Data is partitioned by node – All queries can be answered by a single partition • Write operations are idempotent – Can be applied multiple times without changing the result • And commutative – Changing the order of operands doesn’t change the result. 38
  • 39.
    A Little MoreAbout Idempotency • Applied several times with no change to the result • A operation ’O’ on set S is called idempotent if, for all x in S, x O x = x. • Set union – A U B = {X: X E A or X E B} • Set intersection – A n B = {X: X E A and X E B} 39
  • 40.
    A Little MoreAbout Commutative • Changing the order of operands doesn’t change the result. 3+2=5 • Can be combined with idempotency • Let’s look at the follow command in Twitter • Let X = follow person X • Let Y = follow person Y • Then 3X + 2Y = 2Y + 3X • And 2X + 3Y = 3X + 2Y • Note: it’s only true for the same operation. 40
  • 41.
    Commutative Writes HelpBring up Partitions • Partition can receive write traffic immediately • Receive dump of data in the background • Live for read as soon as the dump is complete. 41
  • 42.
    Performance? • Currently store13 billion edges • 20K writes / second • 100K reads / second. 42
  • 43.
    Punchline? • Under allthe bells and whistles... – Its MySQL ☺. 43
  • 44.
    So is thisthe Future? • Yes! • And No! 44
  • 45.
    What?! How CanThat be?! 45