NoSQL: What Is It and Why Would I Care?
     Eberhard Wolff




21.09.11
Alternative Databases: NoSQL
►    NoSQL: Not only SQL


►    A good example for a catchy but bad name
►    Not positive definition, rather “not something else”
►    Now: Even less clear
Why NoSQL?
►    Exponential data growth


►    More and more connected data
     >  Hypertext, Blogs, User generated content, Blogs


►    Semi structured
     >  User generated content
     >  Full text search / indices instead of Query-by-Example


►    Integration on the database less common


►    Cloud prefers scale out over scale up
     >  Cloud supports scale up: Reboot into larger machine
     >  …but eventually you will need to scale out i.e. add more machines
NoSQL Flavors
►    Key / value store
►    Document
►    Wide Column: Lots of Columns


►    Graph Database: Graphs with nodes, relationships and properties
►    Object databases: Stores objects – not rows


►    Note: NoSQL is actually vaguely defined
Key-Value Stores
►    Maps keys to values                               Key   Value
►    Just a large globally available Map               42    Some
                                                             data
►    i.e. not very powerful data model
►    Advantages
     >  Easy to understand
     >  Easier to build scale out solutions
        (no joins, easy sharding etc)
►    Disadvantages
     >  Simplistic data model
     >  Not a good fit for complex data
     >  Might add complexity to the application code
•    Focus in Scalability
•    Redis: Think cache + Persistence
•    Riak
Key Value Store: Hybrid Approach
►    Might just be used to store specific data


►    I.e. scores of players in an online game
     >  No complex structure
     >  Need to scale
     >  Lots of reads and write


►    Player name, age, address would still be in a RDBMS


►    Hybrid approach
Key-Value Stores: Store All Data
►    Storing data as serialized blobs
     >  "user:someuser" è "someuser|someuser@example.com|more|data|here"
►    Storing data as multiple keys
     >  "user:username:someuser" è "someuser"
     >  "user:email:someuser" è "someuser@example.com"
     >  Requires multi get/set to be efficient
     >  Allows some querying if the database supports wildcards,
        like "user:email:someuser*"
►    Storing links
     >  Blob: "basket:someuser" è"...|item|1|product|product:123|..."
     >  Separate keys: "basket:someuser:item:1:product" è "product:123"
        –  Multi-get: "basket:someuser:*" loads the shopping basket and all items
►    Easy to understand, hard to implement
Document Stores
►    Aggregates are typically stored as "documents“ (key-value collection)
►    JSON, BSON (binary JSON) and XML are common
►    Still no schema, so add any data at runtime
►    The semi-structure of the document allows the database to build indexes, allowing
     queries that address properties of the document
     >  E.g. "find all baskets that contain the product 123"
►    Relations might be modeled as links
►    Advantages
     >  Good fit for semi structured data
     >  In particular a good fit for JSON, XML, HTML…
     >  Probably the easiest transition from RDBMS
►    Disadvantages
     >  Does not scale to the key/value store level
►    Focus on semi structured data e.g. JSON
►    MongoDB, CouchDB
Wide Column
►     Add any "column" you like to a row
                                                                          XX

►     Not a key-value store, but a "key-(column-value)" store        XX        XX        XX        XX

                                                                               XX   XX   XX
►     Column families are like tables                                     XX   XX        XX        XX


►     E.g. in the "Users" column family                              XX        XX   XX             XX

                                                                          XX        XX        XX   XX
      >  "someuser" è ("username"è"someuser"),                     XX        XX        XX        XX

                         ("email" è"someuser@example.com")          XX   XX

                                                                               XX   XX   XX
►     Since columns are named, some databases provide indexing          XX                    XX   XX

      >  E.g. Google AppEngine allows you to define columns that can XX queried
                                                                     be       XX              XX

                                                                          XX   XX        XX        XX
►     Advantages                                                          XX   XX   XX        XX

      >  Easy to store complex and heterogeous data                  XX        xX   XX   XX   XX



§    Apache Cassandra
§    Amazon SimpleDB
Graph
►    Nodes with Properties
►    Typed relationships with properties


►    Ideal e.g. to model relations in a social network


►    Easy to find number of followers, degree of relation etc.


►    Neo4j
What happened to Queries?
►    Data is easily and quickly read/stored using primary key
►    Denormalize data for commonly used queries
     >  Store twitter inbox in key/value as
        –  "inbox:someuser" è ("posts:123", "posts:234", ...)
     >  instead of doing the query (RDBMS)
        –  select p.* from POSTS p, POSTLINKS pl where p.id = pl.postId and
           pl.userid=42
►    Store reverse lookup
     >  ”ewolff|following" è (”spring_rod", ”spring_juergen")
     >  ”post:435|RT" è (”post:42", ”post:21")
What It Means for Developers
§  More technologies to have fun with
§  Broader choice of persistence stores
§  Probably Cross Store Persistence
    •  Store name, firstname etc in RDBMS
    •  Store followers in Graph database

  •  Store Content in RDBMS
  •  Store User Generated Content in Document database


§  Spring Data
    •  Similar APIs for JPA and NoSQL
    •  Support for cross store persistence
    •  Sophisticated support for generic DAOs
    •  E.g. just add findByName() method, implementation is provided
§  QueryDSL
    •  JPA Criteria API done right

NoSQL Overview

  • 1.
    NoSQL: What IsIt and Why Would I Care? Eberhard Wolff 21.09.11
  • 2.
    Alternative Databases: NoSQL ►  NoSQL: Not only SQL ►  A good example for a catchy but bad name ►  Not positive definition, rather “not something else” ►  Now: Even less clear
  • 3.
    Why NoSQL? ►  Exponential data growth ►  More and more connected data >  Hypertext, Blogs, User generated content, Blogs ►  Semi structured >  User generated content >  Full text search / indices instead of Query-by-Example ►  Integration on the database less common ►  Cloud prefers scale out over scale up >  Cloud supports scale up: Reboot into larger machine >  …but eventually you will need to scale out i.e. add more machines
  • 4.
    NoSQL Flavors ►  Key / value store ►  Document ►  Wide Column: Lots of Columns ►  Graph Database: Graphs with nodes, relationships and properties ►  Object databases: Stores objects – not rows ►  Note: NoSQL is actually vaguely defined
  • 5.
    Key-Value Stores ►  Maps keys to values Key Value ►  Just a large globally available Map 42 Some data ►  i.e. not very powerful data model ►  Advantages >  Easy to understand >  Easier to build scale out solutions (no joins, easy sharding etc) ►  Disadvantages >  Simplistic data model >  Not a good fit for complex data >  Might add complexity to the application code •  Focus in Scalability •  Redis: Think cache + Persistence •  Riak
  • 6.
    Key Value Store:Hybrid Approach ►  Might just be used to store specific data ►  I.e. scores of players in an online game >  No complex structure >  Need to scale >  Lots of reads and write ►  Player name, age, address would still be in a RDBMS ►  Hybrid approach
  • 7.
    Key-Value Stores: StoreAll Data ►  Storing data as serialized blobs >  "user:someuser" è "someuser|[email protected]|more|data|here" ►  Storing data as multiple keys >  "user:username:someuser" è "someuser" >  "user:email:someuser" è "[email protected]" >  Requires multi get/set to be efficient >  Allows some querying if the database supports wildcards, like "user:email:someuser*" ►  Storing links >  Blob: "basket:someuser" è"...|item|1|product|product:123|..." >  Separate keys: "basket:someuser:item:1:product" è "product:123" –  Multi-get: "basket:someuser:*" loads the shopping basket and all items ►  Easy to understand, hard to implement
  • 8.
    Document Stores ►  Aggregates are typically stored as "documents“ (key-value collection) ►  JSON, BSON (binary JSON) and XML are common ►  Still no schema, so add any data at runtime ►  The semi-structure of the document allows the database to build indexes, allowing queries that address properties of the document >  E.g. "find all baskets that contain the product 123" ►  Relations might be modeled as links ►  Advantages >  Good fit for semi structured data >  In particular a good fit for JSON, XML, HTML… >  Probably the easiest transition from RDBMS ►  Disadvantages >  Does not scale to the key/value store level ►  Focus on semi structured data e.g. JSON ►  MongoDB, CouchDB
  • 9.
    Wide Column ►  Add any "column" you like to a row XX ►  Not a key-value store, but a "key-(column-value)" store XX XX XX XX XX XX XX ►  Column families are like tables XX XX XX XX ►  E.g. in the "Users" column family XX XX XX XX XX XX XX XX >  "someuser" è ("username"è"someuser"), XX XX XX XX ("email" è"[email protected]") XX XX XX XX XX ►  Since columns are named, some databases provide indexing XX XX XX >  E.g. Google AppEngine allows you to define columns that can XX queried be XX XX XX XX XX XX ►  Advantages XX XX XX XX >  Easy to store complex and heterogeous data XX xX XX XX XX §  Apache Cassandra §  Amazon SimpleDB
  • 10.
    Graph ►  Nodes with Properties ►  Typed relationships with properties ►  Ideal e.g. to model relations in a social network ►  Easy to find number of followers, degree of relation etc. ►  Neo4j
  • 11.
    What happened toQueries? ►  Data is easily and quickly read/stored using primary key ►  Denormalize data for commonly used queries >  Store twitter inbox in key/value as –  "inbox:someuser" è ("posts:123", "posts:234", ...) >  instead of doing the query (RDBMS) –  select p.* from POSTS p, POSTLINKS pl where p.id = pl.postId and pl.userid=42 ►  Store reverse lookup >  ”ewolff|following" è (”spring_rod", ”spring_juergen") >  ”post:435|RT" è (”post:42", ”post:21")
  • 12.
    What It Meansfor Developers §  More technologies to have fun with §  Broader choice of persistence stores §  Probably Cross Store Persistence •  Store name, firstname etc in RDBMS •  Store followers in Graph database •  Store Content in RDBMS •  Store User Generated Content in Document database §  Spring Data •  Similar APIs for JPA and NoSQL •  Support for cross store persistence •  Sophisticated support for generic DAOs •  E.g. just add findByName() method, implementation is provided §  QueryDSL •  JPA Criteria API done right