SlideShare a Scribd company logo
SQL or NoSQL, that is the question! October 2011 Andraž Tori, CTO at Zemanta @andraz andraz@zemanta.com
Answering - Why NoSQL? - What is NoSQL? - How does it work?
SQL is awesome! - Structured Query Language - ACID Atomicity, Consistency, Isolation, Durability - Predictable - Schema - Based on rational algebra - Standardized
No, really, it's awesome! - Hardened - Free and commercial choices - MySQL, PostgreSQL, Oracle, DB2, MS SQL... - Commercial support - Tooling - Everyone knows it - It's mature!
 
So this is the end, right?
Why the heck would someone not want SQL?
Why not to use SQL? - Clueless self-thought programmers who use text files - NIH - Not Invented Here syndrome. And I want to design my own CPU! - Because it's hard! - I can't afford it - “This app was first ported from Clipper to DBase”
Some other perspectives...
Let's say ...
You are a big tech company, located on west coast of USA
 
You are... - big international web company based in San Francisco - 5 data centers around the world - Petabytes of data behind the service - A day of downtown costs you at least millions - And it's not question of when, but if
You want to - keep the service up no matter what - have it fast - deal with humongous amounts of data - enable your engineers to make great stuff
You are...
Some interesting constraints Amazon claim that just an extra one tenth of a second on their response times will cost them 1% in sales.
So... - Some pretty big and important problems - And brightest engineers in the world - Who loooove to build stuff - Sooner or later even Oracle RAC cluster is not enough
Numbers everybody should know! Jeff Dean at famous Stanford talk L1 cache reference 0.5 ns Branch mispredict  5 ns L2 cache reference  7 ns Mutex lock/unlock  25 ns Main memory reference  100 ns Compress 1K bytes w/ cheap algorithm  3,000 ns Send 2K bytes over 1 Gbps network  20,000 ns Read 1 MB sequentially from memory  250,000 ns Round trip within same datacenter  500,000 ns Disk seek  10,000,000 ns Read 1 MB sequentially from disk  20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns
Facebook circa 2009 - from 200GB (March 2008) to 4 TB of compressed new data added per day - 135TB of compressed data scanned per day - 7500+ Database jobs on production cluster per day - 80K compute hours per day - And that's just for data warehousing/analysis - plus thousands of MySQL machines acting as Key/Value stores
Big Data - Internet generates huge amounts of data - First encountered by big guys AltaVista, Google, Amazon … - Need to be handled - Classical storage solutions just don't fit/behave/scale anymore
So smart guys create solutions to these  internal challenges
And then? - Papers: The Google File System  (Google, 2003) MapReduce: Simplified Data Processing on Large Clusters (Google, 2004) Bigtable: A Distributed Storage System for Structured Data (Google, 2006) Amazon Dynamo (Amazon, 2007) - Projects (all open source): Hadoop (coming out of Nutch, Yahoo, 2008) Memcached (LiveJournal, 2003) Voldemort (Linkedin, 2008) Hive (Facebook, 2008) Cassandra (Facebook, 2008) MongoDB (2007)  Redis, Tokyo Cabinet , CouchDB, Riak...
Four papers to rule them all Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “ The Google File System ”, 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October, 2003. Jeffrey Dean and Sanjay Ghemawat, “ MapReduce: Simplified Data Processing on Large Clusters ”, OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, “ Bigtable: A Distributed Storage System for Structured Data ”, OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA, November, 2006. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “ Dynamo: Amazon's Highly Available Key-Value Store ”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.
Solving problems of big guys?
Total Sites Across All Domains August 1995 - October 2011, NetCraft
Yesterday's problem of biggest guys Is today's problem of garden variety startup
 
And so we end up with Cambrian explosion
 
 
These solutions don't have much in common, Except...
They definitely aren't SQL
Not Only SQL
So what are these beasts?
That's a hard question... - There is no standard - This is a new technology - new research - survival of the fittest - experimenting - They obviously fulfill some new needs - but we don't yet know which are real and which superficial - Most are extremely use-case specific
Example use-cases - Shopping cart on Amazon - PageRank calculation at Google - Streams stuff at Twitter - Extreme K/V store at bit.ly - Analytics at Facebook
At the core, it's a different set of trade-offs and operational constraints
Trade-offs and operational constraints - Consistent? Eventually consistent? - Highly available? Distributed across continents? - Fault tolerant? Partition tolerant? Tolerant to consumer grade hardware? - Distributed? Across 10, 100, 1000, 10000 machines?
More possibilities - All in memory? (disk is the new tape) - Batch processing? - tolerant to node failures? - Graph oriented? - No transactions? Programmer deals with inconsistencies? - No schemas? - BASE? (Basically Available, Soft state, Eventually Consistent) - Horizontal scaling, with no downtime? - Self healing?
A consistent topic: CAP Theorem
CAP theorem (Eric Brewer, 2000, Symposium on Principles of Distributed Computing) - CAP = Consistency, Availability, Partition tolerance - Pick any two! - Distributed systems have to sacrifice something to be fast - Usually you drop: - consistency – all clients see the same data - availability – the service returns  something - Sometimes can even tune the trade-offs!
"There is no free lunch with distributed data” –  HP
Eventual Consistency - Different clients can read the data and write it, no locking or maybe partitioned nodes - What we know is that given enough time data is synchronized to the same state across all replicas
But this is horrible!
…  you already are eventually consistent! :) If your database stores how many vases you have in your shop...
Eventual consistency - Conflict resolution: - Read time - Write time - Asynchronous - Possibilities: - client timestamps - vector clocks, when writing say what your original data version was - Conflict resolution can be server or client based
There are different kinds of consistencies - Read-your-writes consistency - Monotonic write / monotonic read consistency - Session consistency - Casual consistency
There's not even a proper taxonomy of features different NoSQL solutions offer
And this presentation is too short to present whole breadth of possibilities
Usual taxonomy of NoSQL Usual taxonomy: - Key/Value stores - Column stores - Document stores - Graph stores
Other attributes - In-memory / on-disk - Latency / throughput (batch processing) - Consistency / Availability
Key/Value stores - a.k.a. Distributed hashtables! - Amazon Dynamo - Redis, Voldemort, Cassandra, Tokyo Cabinet, Riak
Document databases - Similar to Key/Value, but value is a document - JSON or something similar, flexible schema - CouchDB, MongoDB, SimpleDB... - May support indexing or not - Usually support more complex queries
Column stores - one key, multiple attributes - hybrid row/column - BigTable, Hbase, Cassandra, Hypertable
Graph Databases - Neo4J, Maestro OpenLink, InfiniteGraph, HyperGraphDB, AllegroGraph - Whole semantic web shebang!
To make the situation even more confusing... - Fast pace of development - In-memory stores gain on-disk support overnight - Indexing capabilities are added
Two examples - Cassandra - Hadoop - Hive - Mahout
 
Cassandra - BigTable + Dynamo - P2P, horizontally scalable - No SPOF - Eventually consistent - Tunable tradeoffs between consistency and availability - number of replicas, writes, reads
Cassandra – writes - No reads - No seeks - Log oriented writes - Fast, atomic inside ColumnFamily - Always available for writing
Cassandra - Billions of rows - Mysql: ~ 300ms write ~ 350ms read - Cassandra: ~ 0.12ms write ~ 15ms read
Not enough time to go into data model...
Cassandra In production at:  Facebook, Digg, Rackspace, Reddit, Cloudkick, Twitter - largest production cluster over 150TB and over 150 machines Other stuff: - pluggable partitioner (Random/OrderPerserving) - rack aware, datacenter aware
Experiences? - Works pretty good at Zemanta - user preferences store - extending to new use-cases - Digg had some problems - Don't necessary use it as primary store - Not very easy to back-up, situation is improving
Cassandra - queries - Column by key - Slices (of columns/supercolumns) - Range queries (when using OrderPerservingPartitioner to be efficient)
 
Hadoop - GFS + MapReduce - Fault tolerant - (massively) distributed - massive datasets - batch-processing (non real-time responses) - Written in Java - A whole ecosystem
Hadoop: Why? (Owen O’Malley, Yahoo Inc!, omalley@apache.org) •  Need to process 100TB datasets with multi-day jobs •  On 1 node: –  scanning @ 50MB/s = 23 days –  MTBF = 3 years •  On 1000 node cluster: –  scanning @ 50MB/s = 33 min –  MTBF = 1 day •  Need framework for distribution –  Efficient, reliable, easy to use
Hadoop @ Facebook - Use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning. - Currently  2 major clusters: A 1100-machine cluster with 8800 cores and about 12 PB raw storage. A 300-machine cluster with 2400 cores and about 3 PB raw storage. Each (commodity) node has 8 cores and 12 TB of storage. - Heavy users of both streaming as well as the Java apis. They built a higher level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/).
But also at smaller startups - Zemanta: 2 to 4 node cluster, 7TB - log processing - Hulu 13 nodes - log storage and analysis - GumGum 9 nodes - image and advertising analytics - Universities: Cornell – Generating web graphs (100 nodes) - It's almost everywhere
Hadoop Architecture - HDFS - HDFS provides a single distributed filesystem - Managed by a NameNode (SPOF) - Append-only filesystem - distributed by blocks (for example 64MB) - It's like one big RAID over all the machines - tunable replication - Rack aware, datacenter aware - It just works, really!
 
 
Hadoop Architecture - MapReduce - Based on an old concept from Lisp - Generally it's not just map-reduce, it's: Map -> shuffle (sort) -> merge-> reduce - Jobs can be partitioned - Jobs can be run and be restarted independently (parallelization, fault tolerance) - Aware of data-locality of HDFS - Speculative execution (toward the end, of tasks machines that stall)
Infamous word counting example - “One and one is two and one is three” - Two mappers: “One and one is”, “two and one is three” - Pretty “stupid” mappers, just output word and “1” Otuput Mapper1:  One 1 And 1 One 1 Is 1 Output Mapper2:  Two 1 And 1 One 1 Is 1 Three 1 And 1 And 1 Is 1 Is 1 One 1 One 1 One 1 Two 1 Three 1 And 2 Is 2 One 3 Two 1 Three 1 Sorter Reducer
Important to know - Mappers can output more than one output per input (or none) - Bucketing for reducers happens immediately after mapping output - Every reducer gets all input records for certain “key” - All parts are highly pluggable – readers, mapping, sorting, reducing … it's java
Hadoop - You can write your jobs in Java - You get used to thinking inside the constraints - You can use “Hadoop Streaming” to write jobs in any language - It's great not to have to think about the machines, but you can “peep” if you want to see how your job is doing.
Now, this is a bit wonky, right? - Word counting is a really bad example - However it's like “Hello world”, so get used to it - When you get to real problems it gets much more logical
Benchmarks, 2009 This doesn't help me much, but...  Bytes Nodes Maps Reduces Replication Time 500000000000 1406 8000 2600 1 59 seconds 1000000000000 1460 8000 2700 1 62 seconds 100000000000000 3452 190000 10000 2 173 minutes 1000000000000000 3658 80000 20000 2 975 minutes
Hive
Hive - A system built on top of Hive that mimics SQL - Hive Query Language - Built at Facebook, since writing MapReduce jobs in Java is tedious basic tasks - Every operation is one or multiple full index scans - Bunch of heuristics, query optimization
Hive – Why we love it at Zemanta - Don't need to transform your data on “load time” - Just copy your files to HDFS (preferably compressed and chunked) - Write your own deserializer (50 lines in Java) - And use your file as a table  - Plus custom User Defined Functions
 
Mahout - Bunch of algorithms implemented Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Random forest decision tree based classifier High performance java collections (previously colt collections) A vibrant community and many more cool stuff to come by this summer thanks to Google summer of code
General notes
Some observations - Non-fixed schemas are a blessing when you have to adapt constantly - that doesn't mean you should not have documentation and be thoughtful! - Denormalization is the way to scale - sorry guys - Clients get to manage things more precisely, but also have to manage things more precisely
Some internals, “fun” tricks - Bloom filter: Is data on this node? Maybe / Definitely not Maybe -> let's go to disk to check out - Vector clocks - Consistent hashing
Consistent hashing - key -> hash -> “coordinator node” - depending on replication the key is then stored in sequential N nodes - When new node gets added to the ring replication is relatively easy
And if you don't take anything else from this presentation...
 
 
 
But there's more to it
This is the edge today - Tons of interesting research waiting to be made - Ability to leverage these solutions to process terabytes of data cheaply - Ability to seize new opportunities - Innovation is the only thing keeping you/us ahead - Are you preparing yourself for tomorrow's technologies? Tomorrow's research?
Images http://www.flickr.com/photos/60861613@N00/3526232773/sizes/m/in/photostream/ http://www.zazzle.com/sql_awesome_me_tshirt-235011737217980907 http://geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html http://hadoop.apache.org/common/docs/current/hdfs_design.html http://www.flickr.com/photos/unitednationsdevelopmentprogramme/4273890959/

More Related Content

What's hot (20)

PPTX
Introduction to NuoDB
Sandun Perera
 
PPTX
NoSQL databases - An introduction
Pooyan Mehrparvar
 
PDF
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
PPT
NoSQL Seminer
Partha Das
 
PPT
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
PPTX
NoSQL Architecture Overview
Christopher Foot
 
PPTX
Introduction to Cassandra (June 2010)
gdusbabek
 
PPTX
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
DataStax
 
PDF
The Future of Distributed Databases
NuoDB
 
PDF
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
Ontico
 
PDF
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
Insight Technology, Inc.
 
PDF
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
Md Kamaruzzaman
 
PPT
No sql
Prateek Jain
 
KEY
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
PDF
Polyglot Persistence - Two Great Tastes That Taste Great Together
John Wood
 
PDF
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
DataStax Academy
 
PPT
Webinar: High Performance MongoDB Applications with IBM POWER8
MongoDB
 
PDF
Logical-DataWarehouse-Alluxio-meetup
Gianmario Spacagna
 
PPTX
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
DataStax
 
PPTX
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to NuoDB
Sandun Perera
 
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
DataStax
 
NoSQL Seminer
Partha Das
 
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
NoSQL Architecture Overview
Christopher Foot
 
Introduction to Cassandra (June 2010)
gdusbabek
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
DataStax
 
The Future of Distributed Databases
NuoDB
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
Ontico
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
Insight Technology, Inc.
 
SQL, NoSQL, Distributed SQL: Choose your DataStore carefully
Md Kamaruzzaman
 
No sql
Prateek Jain
 
MongoDB vs Mysql. A devops point of view
Pierre Baillet
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
John Wood
 
C* Summit 2013: Searching for a Needle in a Big Data Haystack by Jason Ruther...
DataStax Academy
 
Webinar: High Performance MongoDB Applications with IBM POWER8
MongoDB
 
Logical-DataWarehouse-Alluxio-meetup
Gianmario Spacagna
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
DataStax
 
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 

Similar to SQL or NoSQL, that is the question! (20)

PPT
Bhupeshbansal bigdata
Bhupesh Bansal
 
ODP
Front Range PHP NoSQL Databases
Jon Meredith
 
PPT
Big Data
NGDATA
 
PPT
No sql
Shruti_gtbit
 
PPTX
NoSQL A brief look at Apache Cassandra Distributed Database
Joe Alex
 
KEY
Nosql-columbia-feb2011
siculars
 
PPT
No SQL Databases as modern database concepts
debasisdas225831
 
PPTX
Big Data (NJ SQL Server User Group)
Don Demcsak
 
PPT
No sql landscape_nosqltips
imarcticblue
 
PDF
HPTS 2011: The NoSQL Ecosystem
Adam Marcus
 
PDF
The NoSQL Ecosystem
yarapavan
 
PDF
Scaling Out With Hadoop And HBase
Age Mooij
 
KEY
Escalando Aplicaciones Web
Santiago Coffey
 
PDF
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
PPTX
Sql vs NoSQL
RTigger
 
PDF
A walk down NOSQL Lane in the cloud
siculars
 
PPTX
Big Data and the growing relevance of NoSQL
Abhijit Sharma
 
PPTX
Breaking the Relational Headlock: A Survey of NoSQL Datastores
gdusbabek
 
PPTX
NOSQL
akbarashaikh
 
PDF
Cassandra background-and-architecture
Markus Klems
 
Bhupeshbansal bigdata
Bhupesh Bansal
 
Front Range PHP NoSQL Databases
Jon Meredith
 
Big Data
NGDATA
 
No sql
Shruti_gtbit
 
NoSQL A brief look at Apache Cassandra Distributed Database
Joe Alex
 
Nosql-columbia-feb2011
siculars
 
No SQL Databases as modern database concepts
debasisdas225831
 
Big Data (NJ SQL Server User Group)
Don Demcsak
 
No sql landscape_nosqltips
imarcticblue
 
HPTS 2011: The NoSQL Ecosystem
Adam Marcus
 
The NoSQL Ecosystem
yarapavan
 
Scaling Out With Hadoop And HBase
Age Mooij
 
Escalando Aplicaciones Web
Santiago Coffey
 
Understanding and building big data Architectures - NoSQL
Hyderabad Scalability Meetup
 
Sql vs NoSQL
RTigger
 
A walk down NOSQL Lane in the cloud
siculars
 
Big Data and the growing relevance of NoSQL
Abhijit Sharma
 
Breaking the Relational Headlock: A Survey of NoSQL Datastores
gdusbabek
 
Cassandra background-and-architecture
Markus Klems
 
Ad

More from Andraz Tori (10)

PDF
Ljubljana je Zakon 2013
Andraz Tori
 
PDF
Triple your blog post frequency
Andraz Tori
 
PDF
Future of content cration
Andraz Tori
 
PDF
Augmenting Content
Andraz Tori
 
PDF
Zemanta Tech Talk at Audible
Andraz Tori
 
PDF
Quality, quantity, web and semantics
Andraz Tori
 
ODP
#LjubljanaJeZakon
Andraz Tori
 
PDF
Semantic web user interfaces - Do they have to be ugly?
Andraz Tori
 
ODP
SemWeb install-fest presentation
Andraz Tori
 
ODP
Beyond who else bought what
Andraz Tori
 
Ljubljana je Zakon 2013
Andraz Tori
 
Triple your blog post frequency
Andraz Tori
 
Future of content cration
Andraz Tori
 
Augmenting Content
Andraz Tori
 
Zemanta Tech Talk at Audible
Andraz Tori
 
Quality, quantity, web and semantics
Andraz Tori
 
#LjubljanaJeZakon
Andraz Tori
 
Semantic web user interfaces - Do they have to be ugly?
Andraz Tori
 
SemWeb install-fest presentation
Andraz Tori
 
Beyond who else bought what
Andraz Tori
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Advancing WebDriver BiDi support in WebKit
Igalia
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 

SQL or NoSQL, that is the question!

  • 1. SQL or NoSQL, that is the question! October 2011 Andraž Tori, CTO at Zemanta @andraz [email protected]
  • 2. Answering - Why NoSQL? - What is NoSQL? - How does it work?
  • 3. SQL is awesome! - Structured Query Language - ACID Atomicity, Consistency, Isolation, Durability - Predictable - Schema - Based on rational algebra - Standardized
  • 4. No, really, it's awesome! - Hardened - Free and commercial choices - MySQL, PostgreSQL, Oracle, DB2, MS SQL... - Commercial support - Tooling - Everyone knows it - It's mature!
  • 5.  
  • 6. So this is the end, right?
  • 7. Why the heck would someone not want SQL?
  • 8. Why not to use SQL? - Clueless self-thought programmers who use text files - NIH - Not Invented Here syndrome. And I want to design my own CPU! - Because it's hard! - I can't afford it - “This app was first ported from Clipper to DBase”
  • 11. You are a big tech company, located on west coast of USA
  • 12.  
  • 13. You are... - big international web company based in San Francisco - 5 data centers around the world - Petabytes of data behind the service - A day of downtown costs you at least millions - And it's not question of when, but if
  • 14. You want to - keep the service up no matter what - have it fast - deal with humongous amounts of data - enable your engineers to make great stuff
  • 16. Some interesting constraints Amazon claim that just an extra one tenth of a second on their response times will cost them 1% in sales.
  • 17. So... - Some pretty big and important problems - And brightest engineers in the world - Who loooove to build stuff - Sooner or later even Oracle RAC cluster is not enough
  • 18. Numbers everybody should know! Jeff Dean at famous Stanford talk L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 25 ns Main memory reference 100 ns Compress 1K bytes w/ cheap algorithm 3,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from disk 20,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns
  • 19. Facebook circa 2009 - from 200GB (March 2008) to 4 TB of compressed new data added per day - 135TB of compressed data scanned per day - 7500+ Database jobs on production cluster per day - 80K compute hours per day - And that's just for data warehousing/analysis - plus thousands of MySQL machines acting as Key/Value stores
  • 20. Big Data - Internet generates huge amounts of data - First encountered by big guys AltaVista, Google, Amazon … - Need to be handled - Classical storage solutions just don't fit/behave/scale anymore
  • 21. So smart guys create solutions to these internal challenges
  • 22. And then? - Papers: The Google File System (Google, 2003) MapReduce: Simplified Data Processing on Large Clusters (Google, 2004) Bigtable: A Distributed Storage System for Structured Data (Google, 2006) Amazon Dynamo (Amazon, 2007) - Projects (all open source): Hadoop (coming out of Nutch, Yahoo, 2008) Memcached (LiveJournal, 2003) Voldemort (Linkedin, 2008) Hive (Facebook, 2008) Cassandra (Facebook, 2008) MongoDB (2007) Redis, Tokyo Cabinet , CouchDB, Riak...
  • 23. Four papers to rule them all Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “ The Google File System ”, 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October, 2003. Jeffrey Dean and Sanjay Ghemawat, “ MapReduce: Simplified Data Processing on Large Clusters ”, OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, “ Bigtable: A Distributed Storage System for Structured Data ”, OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA, November, 2006. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “ Dynamo: Amazon's Highly Available Key-Value Store ”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.
  • 24. Solving problems of big guys?
  • 25. Total Sites Across All Domains August 1995 - October 2011, NetCraft
  • 26. Yesterday's problem of biggest guys Is today's problem of garden variety startup
  • 27.  
  • 28. And so we end up with Cambrian explosion
  • 29.  
  • 30.  
  • 31. These solutions don't have much in common, Except...
  • 34. So what are these beasts?
  • 35. That's a hard question... - There is no standard - This is a new technology - new research - survival of the fittest - experimenting - They obviously fulfill some new needs - but we don't yet know which are real and which superficial - Most are extremely use-case specific
  • 36. Example use-cases - Shopping cart on Amazon - PageRank calculation at Google - Streams stuff at Twitter - Extreme K/V store at bit.ly - Analytics at Facebook
  • 37. At the core, it's a different set of trade-offs and operational constraints
  • 38. Trade-offs and operational constraints - Consistent? Eventually consistent? - Highly available? Distributed across continents? - Fault tolerant? Partition tolerant? Tolerant to consumer grade hardware? - Distributed? Across 10, 100, 1000, 10000 machines?
  • 39. More possibilities - All in memory? (disk is the new tape) - Batch processing? - tolerant to node failures? - Graph oriented? - No transactions? Programmer deals with inconsistencies? - No schemas? - BASE? (Basically Available, Soft state, Eventually Consistent) - Horizontal scaling, with no downtime? - Self healing?
  • 40. A consistent topic: CAP Theorem
  • 41. CAP theorem (Eric Brewer, 2000, Symposium on Principles of Distributed Computing) - CAP = Consistency, Availability, Partition tolerance - Pick any two! - Distributed systems have to sacrifice something to be fast - Usually you drop: - consistency – all clients see the same data - availability – the service returns something - Sometimes can even tune the trade-offs!
  • 42. "There is no free lunch with distributed data” – HP
  • 43. Eventual Consistency - Different clients can read the data and write it, no locking or maybe partitioned nodes - What we know is that given enough time data is synchronized to the same state across all replicas
  • 44. But this is horrible!
  • 45. … you already are eventually consistent! :) If your database stores how many vases you have in your shop...
  • 46. Eventual consistency - Conflict resolution: - Read time - Write time - Asynchronous - Possibilities: - client timestamps - vector clocks, when writing say what your original data version was - Conflict resolution can be server or client based
  • 47. There are different kinds of consistencies - Read-your-writes consistency - Monotonic write / monotonic read consistency - Session consistency - Casual consistency
  • 48. There's not even a proper taxonomy of features different NoSQL solutions offer
  • 49. And this presentation is too short to present whole breadth of possibilities
  • 50. Usual taxonomy of NoSQL Usual taxonomy: - Key/Value stores - Column stores - Document stores - Graph stores
  • 51. Other attributes - In-memory / on-disk - Latency / throughput (batch processing) - Consistency / Availability
  • 52. Key/Value stores - a.k.a. Distributed hashtables! - Amazon Dynamo - Redis, Voldemort, Cassandra, Tokyo Cabinet, Riak
  • 53. Document databases - Similar to Key/Value, but value is a document - JSON or something similar, flexible schema - CouchDB, MongoDB, SimpleDB... - May support indexing or not - Usually support more complex queries
  • 54. Column stores - one key, multiple attributes - hybrid row/column - BigTable, Hbase, Cassandra, Hypertable
  • 55. Graph Databases - Neo4J, Maestro OpenLink, InfiniteGraph, HyperGraphDB, AllegroGraph - Whole semantic web shebang!
  • 56. To make the situation even more confusing... - Fast pace of development - In-memory stores gain on-disk support overnight - Indexing capabilities are added
  • 57. Two examples - Cassandra - Hadoop - Hive - Mahout
  • 58.  
  • 59. Cassandra - BigTable + Dynamo - P2P, horizontally scalable - No SPOF - Eventually consistent - Tunable tradeoffs between consistency and availability - number of replicas, writes, reads
  • 60. Cassandra – writes - No reads - No seeks - Log oriented writes - Fast, atomic inside ColumnFamily - Always available for writing
  • 61. Cassandra - Billions of rows - Mysql: ~ 300ms write ~ 350ms read - Cassandra: ~ 0.12ms write ~ 15ms read
  • 62. Not enough time to go into data model...
  • 63. Cassandra In production at: Facebook, Digg, Rackspace, Reddit, Cloudkick, Twitter - largest production cluster over 150TB and over 150 machines Other stuff: - pluggable partitioner (Random/OrderPerserving) - rack aware, datacenter aware
  • 64. Experiences? - Works pretty good at Zemanta - user preferences store - extending to new use-cases - Digg had some problems - Don't necessary use it as primary store - Not very easy to back-up, situation is improving
  • 65. Cassandra - queries - Column by key - Slices (of columns/supercolumns) - Range queries (when using OrderPerservingPartitioner to be efficient)
  • 66.  
  • 67. Hadoop - GFS + MapReduce - Fault tolerant - (massively) distributed - massive datasets - batch-processing (non real-time responses) - Written in Java - A whole ecosystem
  • 68. Hadoop: Why? (Owen O’Malley, Yahoo Inc!, [email protected]) • Need to process 100TB datasets with multi-day jobs • On 1 node: – scanning @ 50MB/s = 23 days – MTBF = 3 years • On 1000 node cluster: – scanning @ 50MB/s = 33 min – MTBF = 1 day • Need framework for distribution – Efficient, reliable, easy to use
  • 69. Hadoop @ Facebook - Use Hadoop to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning. - Currently 2 major clusters: A 1100-machine cluster with 8800 cores and about 12 PB raw storage. A 300-machine cluster with 2400 cores and about 3 PB raw storage. Each (commodity) node has 8 cores and 12 TB of storage. - Heavy users of both streaming as well as the Java apis. They built a higher level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/).
  • 70. But also at smaller startups - Zemanta: 2 to 4 node cluster, 7TB - log processing - Hulu 13 nodes - log storage and analysis - GumGum 9 nodes - image and advertising analytics - Universities: Cornell – Generating web graphs (100 nodes) - It's almost everywhere
  • 71. Hadoop Architecture - HDFS - HDFS provides a single distributed filesystem - Managed by a NameNode (SPOF) - Append-only filesystem - distributed by blocks (for example 64MB) - It's like one big RAID over all the machines - tunable replication - Rack aware, datacenter aware - It just works, really!
  • 72.  
  • 73.  
  • 74. Hadoop Architecture - MapReduce - Based on an old concept from Lisp - Generally it's not just map-reduce, it's: Map -> shuffle (sort) -> merge-> reduce - Jobs can be partitioned - Jobs can be run and be restarted independently (parallelization, fault tolerance) - Aware of data-locality of HDFS - Speculative execution (toward the end, of tasks machines that stall)
  • 75. Infamous word counting example - “One and one is two and one is three” - Two mappers: “One and one is”, “two and one is three” - Pretty “stupid” mappers, just output word and “1” Otuput Mapper1: One 1 And 1 One 1 Is 1 Output Mapper2: Two 1 And 1 One 1 Is 1 Three 1 And 1 And 1 Is 1 Is 1 One 1 One 1 One 1 Two 1 Three 1 And 2 Is 2 One 3 Two 1 Three 1 Sorter Reducer
  • 76. Important to know - Mappers can output more than one output per input (or none) - Bucketing for reducers happens immediately after mapping output - Every reducer gets all input records for certain “key” - All parts are highly pluggable – readers, mapping, sorting, reducing … it's java
  • 77. Hadoop - You can write your jobs in Java - You get used to thinking inside the constraints - You can use “Hadoop Streaming” to write jobs in any language - It's great not to have to think about the machines, but you can “peep” if you want to see how your job is doing.
  • 78. Now, this is a bit wonky, right? - Word counting is a really bad example - However it's like “Hello world”, so get used to it - When you get to real problems it gets much more logical
  • 79. Benchmarks, 2009 This doesn't help me much, but... Bytes Nodes Maps Reduces Replication Time 500000000000 1406 8000 2600 1 59 seconds 1000000000000 1460 8000 2700 1 62 seconds 100000000000000 3452 190000 10000 2 173 minutes 1000000000000000 3658 80000 20000 2 975 minutes
  • 80. Hive
  • 81. Hive - A system built on top of Hive that mimics SQL - Hive Query Language - Built at Facebook, since writing MapReduce jobs in Java is tedious basic tasks - Every operation is one or multiple full index scans - Bunch of heuristics, query optimization
  • 82. Hive – Why we love it at Zemanta - Don't need to transform your data on “load time” - Just copy your files to HDFS (preferably compressed and chunked) - Write your own deserializer (50 lines in Java) - And use your file as a table - Plus custom User Defined Functions
  • 83.  
  • 84. Mahout - Bunch of algorithms implemented Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Random forest decision tree based classifier High performance java collections (previously colt collections) A vibrant community and many more cool stuff to come by this summer thanks to Google summer of code
  • 86. Some observations - Non-fixed schemas are a blessing when you have to adapt constantly - that doesn't mean you should not have documentation and be thoughtful! - Denormalization is the way to scale - sorry guys - Clients get to manage things more precisely, but also have to manage things more precisely
  • 87. Some internals, “fun” tricks - Bloom filter: Is data on this node? Maybe / Definitely not Maybe -> let's go to disk to check out - Vector clocks - Consistent hashing
  • 88. Consistent hashing - key -> hash -> “coordinator node” - depending on replication the key is then stored in sequential N nodes - When new node gets added to the ring replication is relatively easy
  • 89. And if you don't take anything else from this presentation...
  • 90.  
  • 91.  
  • 92.  
  • 94. This is the edge today - Tons of interesting research waiting to be made - Ability to leverage these solutions to process terabytes of data cheaply - Ability to seize new opportunities - Innovation is the only thing keeping you/us ahead - Are you preparing yourself for tomorrow's technologies? Tomorrow's research?
  • 95. Images http://www.flickr.com/photos/60861613@N00/3526232773/sizes/m/in/photostream/ http://www.zazzle.com/sql_awesome_me_tshirt-235011737217980907 http://geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html http://hadoop.apache.org/common/docs/current/hdfs_design.html http://www.flickr.com/photos/unitednationsdevelopmentprogramme/4273890959/

Editor's Notes

  • #2: Kaj sploh je Silicijeva Dolina? Zakaj se to sploh sprašujemo? Mislijo politiki dobesedno? Povedal bom o pozitivnih straneh. V bistvu sem hotel povedati drugo zgodbo