SlideShare a Scribd company logo
*
                 databases
                 query_language
                 
<>
‘SQL’;
Gavin Heavyside - ACCU Conference - 16 April 2011
*
databases
query_language

<>
‘SQL’
LIMIT
4;
Me
• Director of Engineering at MyDrive
• Hands-on coding in Ruby, C++ & others
• Big data, SW architecture, robustness, tdd,
  devops, data analysis
• Background of SW for telecoms, mobile,
  embedded
• @gavinheavyside
MyDrive Solutions
• Driver behaviour analysis and scoring for
  telematics-based insurance
• Large-scale geospatial processing of GPS
  and map data
• Relational DBs - PostgreSQL, MySQL
• Non-relational DBs - Redis, HBase
• Big Data tools - Hadoop
• Built on Linux and open-source stack
RDBMS
What is an RDBMS

• “Codd’s 12 Rules”, 1970
• Relations
 • e.g. tables, rows, columns
• Relational Operators
 • Manipulate data in tabular form
ACID

• Atomicity
• Consistency
• Isolation
• Durability
Atomicity


• All or nothing
• Maintain atomicity across failures
Consistency

• DB moves from one consistent state to
  another
• Only valid data is written to DB
• It can only enforce rules it knows about
Isolation

• Transactions can’t see data from other
  incomplete transactions
• Blocking & Deadlocks
 • Dirty reads
 • MVCC
Locking

• Row locking
• Whole table locking
• TX might require lots of locks
• Blocking
MVCC

• Multi-Version Concurrency Control
• Maintain several versions of objects
• Read & write timestamps on transactions
• Reads never blocked
Durability


• Data from successful tx is never lost
What’s wrong with
 relational DBs?
http://www.flickr.com/photos/exfordy/4734358134/
All the cool kids use
   non-relational DBs...
Facebook               LinkedIn




Twitter
                    Google
...and relational DBs
What’s wrong with
    relational DBs?

• Nothing
• ‘Impedance Mismatch’
• Scaling
Scaling an RDBMS
• Launch successful service
• Read saturation - add caching
• Write saturation - add hardware (£££)
• Queries slow - denormalise
• Reads still too slow - prematerialise
  common queries, stop joining
• Writes too slow - drop secondary indexes
  and triggers
Denormalising
• Normalise logical data design
 • Joins
 • Materialised views can optimise queries
• Denormalise logical data design
 • Eliminate joins
 • Application must ensure data consistency
Scaling a distributed DB


• Just add more commodity servers...
• ...we wish
CAP Theorem

• Eric Brewer, 2000
• Distributed System can’t simultaneously be
 • Consistent
 • Available
 • Partition-tolerant
BASE

• Basically Available
• Soft state
• Eventually consistent
• Relaxation of the C in CAP
Eventual Consistency

• All nodes eventually see the same data
• Different strategies
 • One
 • Quorum
 • All
Horizontal Scaling

• Partitioning
• Sharding
• Dynamo-style
http://vimeo.com/13667174
Non-relational
   Database Families
• Document-oriented
• Graph
• Column-oriented
• Key-value & DHT
• Others
Document
Databases
Document Databases

• IBM Lotus
• CouchDB
• MongoDB
• Riak
http://mongodb.org
MongoDB

• JSON-style documents
• Indexes on any field
• Replication, auto-sharding
• Map/Reduce
Non-Relational Databases at ACCU2011
MongoDB Demo
Other Features

• Document linking & embedding
• GridFS - store large files
• Geospatial indexes and searches
OM
Graph DBs



     http://www.flickr.com/photos/thefangmonster/2301364418/
Graph Databases

• Nodes, relationships & properties
• Query by traversing graph
• Natural fit for recommendations, shortest
  paths, social graph
Graph DBs

• FlockDB
• Neo4j
• Apache Hama
• Google Pregel
Neo4j

• Embedded
• Server
• REST
• Components - indexing, management, rdf,
  geospatial
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011
Key-Value & DHT
Key-Value & DHT

• Amazon Dynamo
• Project Voldemort
• Redis
• Tokyo Cabinet
• Amazon SimpleDB
http://redis.io
redis
• By Salvatore Sanfillipo (@antirez)
• Sponsored by VMware
• data-structure server
• strings, hashes, lists
• sets, sorted sets
• All operations in memory, backed by disk
Text
   Interactive
 Documentation
Non-Relational Databases at ACCU2011
Redis Demo
Other features

• Replication (master/slaves)
• Persistence
 • Snapshotting
 • Append-only log file
Object Hash Mappers


• cf ORM
• OHM
Other KV Stores

• Berkeley DB
• Memcache
• Microsoft Dynomite
Column-Oriented DBs




         http://www.flickr.com/photos/nationalmediamuseum/3588099765/
Column-Oriented
       Databases
• Google Bigtable
• Cassandra
• Hypertable
• HBase
HBase



        http://www.flickr.com/photos/negativz/14470756/
• Apache top-level project
• Implementation of Google Bigtable
• Distributed
• High write throughput
• ‘real-time’ read/write
HBase

• Automatic partitioning
• Scale linearly and automatically
• Commodity HW
• Fault tolerant
• MapReduce
Data Model

• Schema-less
• Versioned cells
• key/column family/cell qualifier/timestamp
• Column Families
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
Text




  http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
Non-Relational Databases at ACCU2011
Other DBs

• Couchbase
• Kyoto Cabinet
• Many more I’ve omitted
Wrap Up

• RDBMS vs non-relational
• Distribute DBs
• Non-relational families
The End




@gavinheavyside
gavin.heavyside@mydrivesolutions.com

More Related Content

What's hot (20)

KEY
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Manik Surtani
 
PPTX
Getting Started with Hadoop
Cloudera, Inc.
 
PPTX
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
PPTX
Infinispan, transactional key value data grid and nosql database
Alexander Petrov
 
PDF
Developing polyglot persistence applications #javaone 2012
Chris Richardson
 
PDF
Polyglot Persistence - Two Great Tastes That Taste Great Together
John Wood
 
PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
PDF
Developing polyglot persistence applications (SpringOne China 2012)
Chris Richardson
 
PDF
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
PDF
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
PDF
Scaing databases on the cloud
Imaginea
 
PPT
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
Cloudera, Inc.
 
PDF
The Evolution of Open Source Databases
Ivan Zoratti
 
PPTX
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon
 
PDF
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
rhatr
 
KEY
North Bay Ruby Meetup 101911
Ines Sombra
 
PDF
Orchestrating MySQL
Ivan Zoratti
 
PPTX
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Andrew Brust
 
PDF
Scaling the Web: Databases & NoSQL
Richard Schneeman
 
PPTX
Hadoop Training in Hyderabad
Rajitha D
 
Infinispan, Data Grids, NoSQL, Cloud Storage and JSR 347
Manik Surtani
 
Getting Started with Hadoop
Cloudera, Inc.
 
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
Infinispan, transactional key value data grid and nosql database
Alexander Petrov
 
Developing polyglot persistence applications #javaone 2012
Chris Richardson
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
John Wood
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
Developing polyglot persistence applications (SpringOne China 2012)
Chris Richardson
 
Solr cloud the 'search first' nosql database extended deep dive
lucenerevolution
 
HBaseCon2017 Apache HBase at Didi
HBaseCon
 
Scaing databases on the cloud
Imaginea
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
Cloudera, Inc.
 
The Evolution of Open Source Databases
Ivan Zoratti
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
rhatr
 
North Bay Ruby Meetup 101911
Ines Sombra
 
Orchestrating MySQL
Ivan Zoratti
 
Cloud Computing and the Microsoft Developer - A Down-to-Earth Analysis
Andrew Brust
 
Scaling the Web: Databases & NoSQL
Richard Schneeman
 
Hadoop Training in Hyderabad
Rajitha D
 

Viewers also liked (20)

PDF
Non-Relational Databases & Key/Value Stores
Joël Perras
 
PDF
Relational Model and Relational Algebra - Lecture 3 - Introduction to Databas...
Beat Signer
 
PPT
7. Relational Database Design in DBMS
koolkampus
 
PPTX
Relational databases vs Non-relational databases
James Serra
 
PPTX
Humberto
ira sanchez
 
PDF
Images from Nøstet, Bergen
Birgitte JH
 
PPT
致勝談領導八金律
彭其捷 Jack
 
PPTX
Social Evaluation
guest743866
 
PDF
Raising godly children 19 jun 15
SSMC
 
PPT
Body parts
BlueCherryBlossom
 
PDF
UC Berkeley
philipmalonecody
 
PDF
Docker at ACCU2015
Gavin Heavyside
 
PPT
Kenkuli
tia kemppinen
 
PPTX
Social
guest743866
 
PDF
Maximize How You Individualize: because the Journey and Outcome Matter
Nicholas Kontopoulos
 
PPTX
Margarita Carranza Torres N L 5
piolinsita
 
PDF
Simeon's bucket list
SSMC
 
PDF
Urbanism São Paulo
Birgitte JH
 
PPSX
E:\Documents And Settings\Administrador\Mis Documentos\Arreglo De Registro
maryum01
 
Non-Relational Databases & Key/Value Stores
Joël Perras
 
Relational Model and Relational Algebra - Lecture 3 - Introduction to Databas...
Beat Signer
 
7. Relational Database Design in DBMS
koolkampus
 
Relational databases vs Non-relational databases
James Serra
 
Humberto
ira sanchez
 
Images from Nøstet, Bergen
Birgitte JH
 
致勝談領導八金律
彭其捷 Jack
 
Social Evaluation
guest743866
 
Raising godly children 19 jun 15
SSMC
 
Body parts
BlueCherryBlossom
 
UC Berkeley
philipmalonecody
 
Docker at ACCU2015
Gavin Heavyside
 
Kenkuli
tia kemppinen
 
Social
guest743866
 
Maximize How You Individualize: because the Journey and Outcome Matter
Nicholas Kontopoulos
 
Margarita Carranza Torres N L 5
piolinsita
 
Simeon's bucket list
SSMC
 
Urbanism São Paulo
Birgitte JH
 
E:\Documents And Settings\Administrador\Mis Documentos\Arreglo De Registro
maryum01
 
Ad

Similar to Non-Relational Databases at ACCU2011 (20)

PPTX
Introduction to NoSql
Omid Vahdaty
 
PPTX
Sql vs NoSQL
RTigger
 
PDF
Database Systems - A Historical Perspective
Karoly K
 
PPTX
The Rise of NoSQL and Polyglot Persistence
Abdelmonaim Remani
 
PPTX
NoSQLDatabases
Adi Challa
 
PPTX
NoSQL and MongoDB
Rajesh Menon
 
PPTX
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
Laxmi Pandya
 
PPTX
NoSQL.pptx
RithikRaj25
 
PPTX
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
PDF
Relational vs. Non-Relational
PostgreSQL Experts, Inc.
 
PPTX
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
PPTX
NoSQL
dbulic
 
ODP
Реляционные или нереляционные (Josh Berkus)
Ontico
 
PDF
Heterogenous Persistence
Jervin Real
 
PDF
NoSql and it's introduction features-Unit-1.pdf
ajajkhan16
 
PPTX
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
PDF
History of NoSQL and Azure Documentdb feature set
Soner Altin
 
PDF
NOsql Presentation.pdf
AkshayDwivedi31
 
PDF
Mongo Internal Training session by Soner Altin
mustafa sarac
 
PPTX
Revision
David Sherlock
 
Introduction to NoSql
Omid Vahdaty
 
Sql vs NoSQL
RTigger
 
Database Systems - A Historical Perspective
Karoly K
 
The Rise of NoSQL and Polyglot Persistence
Abdelmonaim Remani
 
NoSQLDatabases
Adi Challa
 
NoSQL and MongoDB
Rajesh Menon
 
DATABASE MANAGEMENT SYSTEM-MRS. LAXMI B PANDYA FOR 25TH AUGUST,2022.pptx
Laxmi Pandya
 
NoSQL.pptx
RithikRaj25
 
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
Relational vs. Non-Relational
PostgreSQL Experts, Inc.
 
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
NoSQL
dbulic
 
Реляционные или нереляционные (Josh Berkus)
Ontico
 
Heterogenous Persistence
Jervin Real
 
NoSql and it's introduction features-Unit-1.pdf
ajajkhan16
 
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
History of NoSQL and Azure Documentdb feature set
Soner Altin
 
NOsql Presentation.pdf
AkshayDwivedi31
 
Mongo Internal Training session by Soner Altin
mustafa sarac
 
Revision
David Sherlock
 
Ad

Recently uploaded (20)

PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
July Patch Tuesday
Ivanti
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 

Non-Relational Databases at ACCU2011

Editor's Notes

  • #2: \n
  • #3: \n
  • #4: \n
  • #5: \n
  • #6: \n
  • #7: 13 rules, numbered 0 to 12\nNo popular DBMS is actually &amp;#x2018;relational&amp;#x2019; by 12 rules - they all break some of them\nLeading commercial - Oracle, MS, IBM (DB2)\nLeading open-source - MySQL, PostgreSQL, SQLite\n
  • #8: \n
  • #9: If one part of transaction fails, it all fails, DB left unchanged.\nFailures: HW, system, DB (disk etc), application (violate constraints on data)\n
  • #10: The DB will enforce consistency and relationships/constraints that have been specified in the schema - everything else is the responsibility of the application\n
  • #11: Dirty reads - allow other transactions to read, but not modify uncommitted data - improve performance\n
  • #12: \n
  • #13: DB creates new version of data for a TX\nOther TXes read the old version until TX completed.\nMVCC used by some non-relational databases\n
  • #14: Usually use a transaction log that can be replayed to rebuild data in event of failure.\n
  • #15: \n
  • #16: \n
  • #17: What most of these companies have in common is scale\nHow would an RDBMS handle the size of data they deal with?\nMost of the big companies have built their own solutions.\nMost of them also use RDBMSes - Facebook is huge MySQL user.\n
  • #18: \n
  • #19: Scaling - RDBMs don&amp;#x2019;t scale linearly - big box == $$$$\ne.g. Graph relationships don&amp;#x2019;t map to tables &amp; rows easily\nSemi/Unstructured data, lots of columns, lots of nulls\n
  • #20: Caching - e.g. memcacheDB, store common queries in memory\ndenormalise - add redundant data, grouped data to reduce table joins - reduce load on physical hardware - improve locality of reference\nSo... you choose a distributed NOSQL fancy modern DB\n
  • #21: \n
  • #22: Not really...\n
  • #23: C - all nodes see same data at the same time\nA - survivors continue to operate when nodes fail\nP - system continues to operate despite message loss between nodes\nMany systems relax consistency\n
  • #24: Also by Eric Brewer \nBASE system relaxes the C in CAP\nBA - might lose access to some data if nodes fail\nSS - System state might change over time without input (eventual consistency, propagation)\n
  • #25: Different ways to consider whether a write has succeeded, whether new value is returned.\n
  • #26: \n
  • #27: Consistent Smashing - video from Basho/Riak\n
  • #28: Lots of overlap between families - esp. column &amp; key-value/DHT\n
  • #29: \n
  • #30: Schema-less way of looking at data as documents rather than fields - all related data in document. \nMaps very well to a lot of applications\n
  • #31: huMONGOus\n10gen\n
  • #32: Can be ACID if using replication for durability\n
  • #33: \n
  • #34: \n
  • #35: \n
  • #36: Object mapper - not ORM\n
  • #37: \n
  • #38: \n
  • #39: FlockDB - Twitter, social graph - simpler than neo4j\nNeo4j - dual open-source/commercial license\nHama - apache project\n
  • #40: ACID transactions\npersistence\nconcurrency\nscalable\n
  • #41: \n
  • #42: \n
  • #43: \n
  • #44: Tokyo Tyrant - network access protocol for Tokyo Cabinet DB\nVoldemort - LinkedIn\n
  • #45: \n
  • #46: Can be ACID if aof fsyncs all the time\n
  • #47: \n
  • #48: \n
  • #49: \n
  • #50: replication non-blocking on master. Writes will work even if slave blocked.\nReplication for scaling (read-only slaves) or for redundancy.\nAOF log - everything that changes the dataset.\nIf server crashes redis replays the AOF\nBGREWRITEAOF to optimize AOF - minimum steps to rebuild dataset in memory\nconfigurable fsync options - every command, every second, never\n\n
  • #51: \n
  • #52: Oracle Berkeley DB, Berkeley DB Java, Berkeley DB XML\nMemcache + Berkeley DB = MemcacheDB, a bit like Redis, for KV\n\n
  • #53: OSDI 2006 (MapReduce was 2004)\n
  • #54: Bigtable - column families, distributed, scale\n
  • #55: \n
  • #56: Consider a whiteboard overview of Hadoop here. \nReal-time (low-latency) as opposed to Hadoop &amp; mapreduce batch jobs. \nNot ACID - effect of distributed writes on consistency and isolation of views\nRelaxes A of cap - consistent &amp; partition tolerant\n
  • #57: partitioned on row count/size\nRegion is basic unit of availability\n\n
  • #58: \n
  • #59: \n
  • #60: \n
  • #61: Queries - no support for complex queries\nCompute query in application (mapreduce, etc)\nall necessary data is denormalised in the row - wide table with lots of columns.\n&amp;#x201C;versioned get&amp;#x201D; returns older version of row\n
  • #62: Couchbase - combination of CouchDB, Membase, Memcached\nKyoto Cabinet - C++ implementation by Tokyo Cabinet author.\n
  • #63: Impedance Mismatch\nCAP Theorem, Eventual Consistency\nRedis, MongoDB, Neo4j, HBase\n
  • #64: \n