SlideShare a Scribd company logo
+
NoSQL – Part 2
CAP Theorem & Column Oriented
Mohammad Sadegh Salehi
Dr.Baraani
Winter2015 Sheikh Bahaie
University
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
2
Winter 2015
Agenda
—Review NoSQL
—Dynamo and BigTable
—NoSQL Classification
—Key-value Stores
—Column Oriented
—Casandra
—Why Casandra
—Question
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
3
Winter 2015
What is NoSQL
review
 Stands for Not Only SQL
 Class of non-relational data storage systems
 Usually do not require a fixed table schema nor do
they use the concept of joins
 All NoSQL offerings relax one or more of the ACID
properties (will talk about the CAP theorem)
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
4
Winter 2015
Dynamo and BigTable
 Three major papers were the seeds of the NoSQL
movement
• BigTable (Google)
• Dynamo (Amazon)
—Gossip protocol (discovery and error detection)
—Distributed key-value data store
—Eventual consistency
• CAP Theorem (discuss in a sec ..)
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
5
Winter 2015
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
7
Winter 2015
What kinds of NoSQL
Review
 NoSQL solutions fall into two major areas:
• Key/Value or ‘the big hash table’.
—Amazon S3 (Dynamo)
—Voldemort
—Scalaris
• Schema-less which comes in multiple flavors, column-
based, document-based or graph-based.
—Cassandra (column-based)
—CouchDB (document-based)
—Neo4J (graph-based)
—HBase (column-based)
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
8
Winter 2015
Key-Value Stores
 Extremely simple interface
• Data model: (key, value) pairs
• Operations:
—Insert(key,value),
—Fetch(key),
—Update(key),
—Delete(key).
 Implementation: efficiency, scalability, fault-
tolerance
• Records distributed to nodes based on key
• Replication
• Single-record transactions,“eventual consistency”
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
9
Winter 2015
Key-Value Data Stores
 Storing Session Information
 User Profiles, Preferences: Almost every user has
a unique userID as well as preferences such as
language, color, timezone, which products the
user has access to , and so on.
Suitable Use Cases
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
10
Winter 2015
Key-Value Data Stores
 As we want the shopping carts to be available
all the time, across browsers, machines, and
sessions, all the shopping information can be put
into value where the key is the userID
Shopping Cart Data
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
11
Winter 2015
Key-Value Data Stores
 Relationships among data
 Multi-operation Transactions
 Query by Data
 Operations by Sets
Not to Use
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
12
Winter 2015
Column-oriented
 Store data in column order
 Allow key-value pairs to be stored (and retrieved
on key) in a massively parallel system,
• Data model: families of attributes defined in a schema,
new attributes can be added,
• Storing principle: big hashed distributed tables,
• Properties: partitioning (horizontally and/or vertically),
high availability etc. completely transparent to
application,
Intro
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
13
Winter 2015
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
14
Winter 2015
Cassandra
 Apache Cassandra™ is a free
 Distributed…
 High performance…
 Extremely scalable…
 Fault tolerant (i.e. no single point of failure)…
 Post-relational database solution.
 Cassandra can serve as both real-time datastore and as a
read-intensive database.
 Compiles to: C++, Java, PHP, Ruby, Erlang, Perl, ...
Thrift
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
15
Winter 2015
Cassandra
Infographic
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
16
Winter 2015
Cassandra
 Originally developed at Facebook
 Follows the BigTable data model: column-oriented
 Uses the Dynamo Eventual Consistency model
 Written in Java
 Open-sourced and exists within the Apache family
 Uses Apache Thrift as it’s API
 Some of its myriad users:
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
17
Winter 2015
Cassandra
 keyspace: Usually the name of the application; e.g.,
'Twitter', 'Wordpress‘.
 column family: structure containing an unlimited
number of rows
• Simple
• Super (nested Column Families)
 column: a tuple with name, value and time stamp
• Each Column has
— Name
— Value
— Timestamp
 key: name of record
 super column: contains more columns
Data Model
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
18
Winter 2015
Cassandra – Data Model
keyspace
settings
column family
settings
column
name value timestamp
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
19
Winter 2015
Cassandra
Column Family & Super Column Family
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
20
Winter 2015
Cassandra
 Cassandra was designed with the understanding that
system/hardware failures can and do occur
 Peer-to-peer, distributed system
 All nodes the same
 Data partitioned among all nodes
in the cluster
 Custom data replication to ensure
fault tolerance
 Read/Write-anywhere design
Architecture Overview
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
21
Winter 2015
Cassandra
 Each node communicates with each other through the
Gossip protocol, which exchanges information across
the cluster every second,
 A commit log is used on each node to capture write
activity. Data durability is assured,
 Data also written to an in-memory
structure (memtable) and then to
disk once the memory structure is
full (an SStable).
Architecture Overview
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
22
Winter 2015
Why Cassandra?
 Gigabyte to Petabyte scalability
 Linear performance gains through adding nodes
 No single point of failure
 Easy replication / data distribution
 Multi-data center and Cloud capable
 No need for separate caching layer
 Tunable data consistency
 Flexible schema design
 Data Compression
 CQL language (like SQL)
 Support for key languages and platforms
 No need for special hardware or software
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
23
Winter 2015
Why Cassandra?
 Capable of comfortably scaling to petabytes
 New nodes = Linear performance increases
 Add new nodes online
Big Data Scalability
1
2
Double Throughput
Capabilities
1
2
3
4
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
24
Winter 2015
Why Cassandra?
 All nodes the same
 Customized replication affords tunable data redundancy
 Read/write from any node
 Can replicate data among different physical data center
racks
No Single Point of Failure
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
25
Winter 2015
Why Cassandra?
 Peer-to-peer architecture removes need for special
caching layer and the programming that goes with it
 The database cluster uses the memory from all
participating nodes to cache the data assigned to each
node
 No irregularities between a memory cache and database
are encountered
No Need for Caching Software
Database Server
Memcached Servers
Application Servers
Writes
Reads
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
26
Winter 2015
Why Cassandra?
 Uses Google’s Snappy data compression algorithm
 Compresses data on a per column family level
 Internal tests at DataStax show up to 80%+ compression
of raw data
 No performance penalty (and some increases in overall
performance due to less physical I/O)!
Data Compression
Portfolio Keyspace
Customer Column Family
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
27
Winter 2015
Why Cassandra?
 Very similar to RDBMS SQL syntax
 Create objects via DDL (e.g. CREATE…)
 Core DML commands supported: INSERT, UPDATE,
DELETE
 Query data with SELECT
CQL Language
Portfolio Keyspace
1
2
3
4
5
6
SELECT *
FROM USERS
WHERE STATE = ‘TX’;
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
28
Winter 2015
Comparison with MySQL
 MySQL > 50 GB Data
Writes Average : ~300 ms
Reads Average : ~350 ms
 Stats provided by Authors using facebook data.
 Cassandra > 50 GB Data
Writes Average : 0.12 ms
Reads Average : 15 ms
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
29
Winter 2015
Cassandra Tools
........DesktopnoSqlCassandra-sadeghnoSqlCassandra-sadegh.mp4
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
30
Winter 2015
Where to get Cassandra?
 Go to www.datastax.com
 DataStax makes free smart start installers available for
Cassandra that include:
• The most up-to-date Cassandra version that is production quality
• A version of DataStax OpsCenter, which is a visual, browser-
based management tool for managing and monitoring
Cassandra
• Drivers and connectors for popular development languages
• Same database and application
• Automatic configuration assistance for ensuring optimal
performance and setup for either stand-alone or cluster
implementations
• Getting Started Guide
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
31
Winter 2015
Where Can I Learn More?
www.datastax.com
 Free Online Documentation
 User/Customer Cas Studies
 Technical White Papers
 Software downloads
 Technical Articles
 User Forums
 Videos
 Tutorials
 FAQ’s
 Blogs
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
32
Winter 2015
Resources
Sites
 Cassandra
• http://cassandra.apache.org
 NoSQL News websites
• http://nosql.mypopescu.com
• http://www.nosqldatabases.com
 “a practical guide to noSQL”, Posted by Denise Miura on
March 17, 2011 at
• http://blogs.marklogic.com/2011/03/17/a-practical-
guide-to-nosql/
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
33
Winter 2015
Resources
Books
 “Cassandra The Definition Guide”, O'Reilly Media, nov2013
 “Cassandra Essential Toturial”, DataStax 2014
 “Professional NoSQL”, Wrox, 2011
 “NoSQL Distilled”, Martin Fowler, 2013
+
NoSQL (part 2) - CAP Theorem & Column Oriented
33
34
Winter 2015
Questions
+
Mohammad Sadegh Salehi
3adegh.ce@gmail.com
Thank You

More Related Content

What's hot (20)

PDF
Intro to Delta Lake
Databricks
 
PDF
Diving into Delta Lake: Unpacking the Transaction Log
Databricks
 
PPTX
Advanced Dimensional Modelling
Vincent Rainardi
 
PPTX
Analytics and Lakehouse Integration Options for Oracle Applications
Ray Février
 
PDF
Data Visualisation & Analytics with Tableau (Beginner) - by Maria Koumandraki
Outreach Digital
 
PPTX
Introduction to Data Warehousing
Gurpreet Singh Sachdeva
 
PPT
Oracle backup and recovery
Yogiji Creations
 
PDF
Client Drivers and Cassandra, the Right Way
DataStax Academy
 
PDF
Data Warehouse Design and Best Practices
Ivo Andreev
 
PDF
Cassandra Database
YounesCharfaoui
 
PPTX
Schemas for multidimensional databases
yazad dumasia
 
PDF
Data Vault Introduction
Patrick Van Renterghem
 
PPTX
Cassandra ppt 1
Skillwise Group
 
PPTX
The oracle database architecture
Akash Pramanik
 
PPTX
Cassandra
Upaang Saxena
 
DOCX
Star ,Snow and Fact-Constullation Schemas??
Abdul Aslam
 
PDF
Getting Started with Databricks SQL Analytics
Databricks
 
ODP
Introduction to Apache Cassandra
Knoldus Inc.
 
PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PDF
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Kent Graziano
 
Intro to Delta Lake
Databricks
 
Diving into Delta Lake: Unpacking the Transaction Log
Databricks
 
Advanced Dimensional Modelling
Vincent Rainardi
 
Analytics and Lakehouse Integration Options for Oracle Applications
Ray Février
 
Data Visualisation & Analytics with Tableau (Beginner) - by Maria Koumandraki
Outreach Digital
 
Introduction to Data Warehousing
Gurpreet Singh Sachdeva
 
Oracle backup and recovery
Yogiji Creations
 
Client Drivers and Cassandra, the Right Way
DataStax Academy
 
Data Warehouse Design and Best Practices
Ivo Andreev
 
Cassandra Database
YounesCharfaoui
 
Schemas for multidimensional databases
yazad dumasia
 
Data Vault Introduction
Patrick Van Renterghem
 
Cassandra ppt 1
Skillwise Group
 
The oracle database architecture
Akash Pramanik
 
Cassandra
Upaang Saxena
 
Star ,Snow and Fact-Constullation Schemas??
Abdul Aslam
 
Getting Started with Databricks SQL Analytics
Databricks
 
Introduction to Apache Cassandra
Knoldus Inc.
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Kent Graziano
 

Viewers also liked (20)

PPTX
Data Modeling with Cassandra Column Families
gdusbabek
 
PDF
Usergrid Overview
usergrid
 
PDF
Open Source Mobile Backend on Cassandra
Ed Anuff
 
PPT
Docker and CloudStack
Sebastien Goasguen
 
PDF
CQL3 in depth
Yuki Morishita
 
PDF
Cybersecurity-Serverless-Graph DB
Sukumar Nayak
 
PDF
Cassandra 2.0 to 2.1
Johnny Miller
 
PDF
How to find Zero day vulnerabilities
Mohammed A. Imran
 
PDF
I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...
Apigee | Google Cloud
 
KEY
Taming NoSQL with Spring Data
Sergi Almar i Graupera
 
PDF
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...
DataStax
 
PDF
Apresentação cassandra
Richiely Paiva
 
PDF
DataStax: A deep look at the CQL WHERE clause
DataStax Academy
 
PPTX
NoSQL, Base VS ACID e Teorema CAP
Aricelio Souza
 
ODP
NoSQL: onde, como e por quê? Cassandra e MongoDB
Rodrigo Hjort
 
PDF
Advanced excel 2010 & 2013 updated Terrabiz
Ahmed Yasir Khan
 
PDF
Key-Value Stores: a practical overview
Marc Seeger
 
PDF
Data Modeling for Apache Cassandra
DataStax Academy
 
PDF
Introduction to column oriented databases
ArangoDB Database
 
Data Modeling with Cassandra Column Families
gdusbabek
 
Usergrid Overview
usergrid
 
Open Source Mobile Backend on Cassandra
Ed Anuff
 
Docker and CloudStack
Sebastien Goasguen
 
CQL3 in depth
Yuki Morishita
 
Cybersecurity-Serverless-Graph DB
Sukumar Nayak
 
Cassandra 2.0 to 2.1
Johnny Miller
 
How to find Zero day vulnerabilities
Mohammed A. Imran
 
I Love APIs 2015: Apache Usergrid Web Scale Mobile APIs with Elastic Search a...
Apigee | Google Cloud
 
Taming NoSQL with Spring Data
Sergi Almar i Graupera
 
A Shortcut to Awesome: Cassandra Data Modeling By Example (Jon Haddad, The La...
DataStax
 
Apresentação cassandra
Richiely Paiva
 
DataStax: A deep look at the CQL WHERE clause
DataStax Academy
 
NoSQL, Base VS ACID e Teorema CAP
Aricelio Souza
 
NoSQL: onde, como e por quê? Cassandra e MongoDB
Rodrigo Hjort
 
Advanced excel 2010 & 2013 updated Terrabiz
Ahmed Yasir Khan
 
Key-Value Stores: a practical overview
Marc Seeger
 
Data Modeling for Apache Cassandra
DataStax Academy
 
Introduction to column oriented databases
ArangoDB Database
 
Ad

Similar to NoSQL Database- cassandra column Base DB (20)

PPTX
NewSQL - Deliverance from BASE and back to SQL and ACID
Tony Rogerson
 
PDF
Gcp data engineer
Narendranath Reddy T
 
PDF
GCP Data Engineer cheatsheet
Guang Xu
 
PPTX
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
PDF
London Redshift Meetup - July 2017
Pratim Das
 
PDF
MySQL Cluster
Mario Beck
 
PPTX
Dynamic DDL: Adding structure to streaming IoT data on the fly
DataWorks Summit
 
PPTX
Svccg nosql 2011_v4
Sid Anand
 
PPTX
Data stores: beyond relational databases
Javier García Magna
 
PDF
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Databricks
 
PDF
Lambda architecture @ Indix
Rajesh Muppalla
 
PDF
Cloud Lambda Architecture Patterns
Asis Mohanty
 
PPTX
Azure Stream Analytics
Marco Parenzan
 
PDF
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Trivadis
 
PDF
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark Summit
 
PPTX
AWS Redshift Introduction - Big Data Analytics
Keeyong Han
 
PDF
4K Video Downloader Crack (2025) + License Key Free
boyjake527
 
PDF
Capcut Pro Crack For PC Latest 2025 Full
mushtaqcheema932
 
PDF
Adobe Photoshop CC 26.3 Crack + Serial Key [Latest 2025]
mushtaqcheema932
 
PDF
minitool partition wizard crack 12.8 latest
qaha7432
 
NewSQL - Deliverance from BASE and back to SQL and ACID
Tony Rogerson
 
Gcp data engineer
Narendranath Reddy T
 
GCP Data Engineer cheatsheet
Guang Xu
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
London Redshift Meetup - July 2017
Pratim Das
 
MySQL Cluster
Mario Beck
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
DataWorks Summit
 
Svccg nosql 2011_v4
Sid Anand
 
Data stores: beyond relational databases
Javier García Magna
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Databricks
 
Lambda architecture @ Indix
Rajesh Muppalla
 
Cloud Lambda Architecture Patterns
Asis Mohanty
 
Azure Stream Analytics
Marco Parenzan
 
Azure Days 2019: Grösser und Komplexer ist nicht immer besser (Meinrad Weiss)
Trivadis
 
Spark and Couchbase: Augmenting the Operational Database with Spark
Spark Summit
 
AWS Redshift Introduction - Big Data Analytics
Keeyong Han
 
4K Video Downloader Crack (2025) + License Key Free
boyjake527
 
Capcut Pro Crack For PC Latest 2025 Full
mushtaqcheema932
 
Adobe Photoshop CC 26.3 Crack + Serial Key [Latest 2025]
mushtaqcheema932
 
minitool partition wizard crack 12.8 latest
qaha7432
 
Ad

More from sadegh salehi (9)

PDF
Cloud intrusion detection System
sadegh salehi
 
PDF
Fault prediction
sadegh salehi
 
PDF
Integration test
sadegh salehi
 
PDF
Interactive and Multimodal Pedagogy Using IWB
sadegh salehi
 
PPTX
Ontology development in protégé-آنتولوژی در پروتوغه
sadegh salehi
 
PDF
Prototype design pattern - الگوی طراحی Prototype
sadegh salehi
 
PDF
Backup and recovery in oracle
sadegh salehi
 
PDF
Jame isfahan mosque
sadegh salehi
 
PDF
مکتب کلبیان
sadegh salehi
 
Cloud intrusion detection System
sadegh salehi
 
Fault prediction
sadegh salehi
 
Integration test
sadegh salehi
 
Interactive and Multimodal Pedagogy Using IWB
sadegh salehi
 
Ontology development in protégé-آنتولوژی در پروتوغه
sadegh salehi
 
Prototype design pattern - الگوی طراحی Prototype
sadegh salehi
 
Backup and recovery in oracle
sadegh salehi
 
Jame isfahan mosque
sadegh salehi
 
مکتب کلبیان
sadegh salehi
 

Recently uploaded (20)

PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PPTX
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
PPTX
Element 7. CHEMICAL AND BIOLOGICAL AGENT.pptx
merrandomohandas
 
PPTX
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
PDF
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
PPTX
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PPTX
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
PPT
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
Hashing Introduction , hash functions and techniques
sailajam21
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
Element 7. CHEMICAL AND BIOLOGICAL AGENT.pptx
merrandomohandas
 
GitOps_Repo_Structure for begeinner(Scaffolindg)
DanialHabibi2
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
PPT2_Metal formingMECHANICALENGINEEIRNG .ppt
Praveen Kumar
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Thermal runway and thermal stability.pptx
godow93766
 

NoSQL Database- cassandra column Base DB

  • 1. + NoSQL – Part 2 CAP Theorem & Column Oriented Mohammad Sadegh Salehi Dr.Baraani Winter2015 Sheikh Bahaie University
  • 2. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 2 Winter 2015 Agenda —Review NoSQL —Dynamo and BigTable —NoSQL Classification —Key-value Stores —Column Oriented —Casandra —Why Casandra —Question
  • 3. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 3 Winter 2015 What is NoSQL review  Stands for Not Only SQL  Class of non-relational data storage systems  Usually do not require a fixed table schema nor do they use the concept of joins  All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)
  • 4. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 4 Winter 2015 Dynamo and BigTable  Three major papers were the seeds of the NoSQL movement • BigTable (Google) • Dynamo (Amazon) —Gossip protocol (discovery and error detection) —Distributed key-value data store —Eventual consistency • CAP Theorem (discuss in a sec ..)
  • 5. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 5 Winter 2015
  • 6. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 7 Winter 2015 What kinds of NoSQL Review  NoSQL solutions fall into two major areas: • Key/Value or ‘the big hash table’. —Amazon S3 (Dynamo) —Voldemort —Scalaris • Schema-less which comes in multiple flavors, column- based, document-based or graph-based. —Cassandra (column-based) —CouchDB (document-based) —Neo4J (graph-based) —HBase (column-based)
  • 7. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 8 Winter 2015 Key-Value Stores  Extremely simple interface • Data model: (key, value) pairs • Operations: —Insert(key,value), —Fetch(key), —Update(key), —Delete(key).  Implementation: efficiency, scalability, fault- tolerance • Records distributed to nodes based on key • Replication • Single-record transactions,“eventual consistency”
  • 8. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 9 Winter 2015 Key-Value Data Stores  Storing Session Information  User Profiles, Preferences: Almost every user has a unique userID as well as preferences such as language, color, timezone, which products the user has access to , and so on. Suitable Use Cases
  • 9. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 10 Winter 2015 Key-Value Data Stores  As we want the shopping carts to be available all the time, across browsers, machines, and sessions, all the shopping information can be put into value where the key is the userID Shopping Cart Data
  • 10. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 11 Winter 2015 Key-Value Data Stores  Relationships among data  Multi-operation Transactions  Query by Data  Operations by Sets Not to Use
  • 11. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 12 Winter 2015 Column-oriented  Store data in column order  Allow key-value pairs to be stored (and retrieved on key) in a massively parallel system, • Data model: families of attributes defined in a schema, new attributes can be added, • Storing principle: big hashed distributed tables, • Properties: partitioning (horizontally and/or vertically), high availability etc. completely transparent to application, Intro
  • 12. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 13 Winter 2015
  • 13. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 14 Winter 2015 Cassandra  Apache Cassandra™ is a free  Distributed…  High performance…  Extremely scalable…  Fault tolerant (i.e. no single point of failure)…  Post-relational database solution.  Cassandra can serve as both real-time datastore and as a read-intensive database.  Compiles to: C++, Java, PHP, Ruby, Erlang, Perl, ... Thrift
  • 14. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 15 Winter 2015 Cassandra Infographic
  • 15. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 16 Winter 2015 Cassandra  Originally developed at Facebook  Follows the BigTable data model: column-oriented  Uses the Dynamo Eventual Consistency model  Written in Java  Open-sourced and exists within the Apache family  Uses Apache Thrift as it’s API  Some of its myriad users:
  • 16. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 17 Winter 2015 Cassandra  keyspace: Usually the name of the application; e.g., 'Twitter', 'Wordpress‘.  column family: structure containing an unlimited number of rows • Simple • Super (nested Column Families)  column: a tuple with name, value and time stamp • Each Column has — Name — Value — Timestamp  key: name of record  super column: contains more columns Data Model
  • 17. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 18 Winter 2015 Cassandra – Data Model keyspace settings column family settings column name value timestamp
  • 18. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 19 Winter 2015 Cassandra Column Family & Super Column Family
  • 19. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 20 Winter 2015 Cassandra  Cassandra was designed with the understanding that system/hardware failures can and do occur  Peer-to-peer, distributed system  All nodes the same  Data partitioned among all nodes in the cluster  Custom data replication to ensure fault tolerance  Read/Write-anywhere design Architecture Overview
  • 20. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 21 Winter 2015 Cassandra  Each node communicates with each other through the Gossip protocol, which exchanges information across the cluster every second,  A commit log is used on each node to capture write activity. Data durability is assured,  Data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SStable). Architecture Overview
  • 21. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 22 Winter 2015 Why Cassandra?  Gigabyte to Petabyte scalability  Linear performance gains through adding nodes  No single point of failure  Easy replication / data distribution  Multi-data center and Cloud capable  No need for separate caching layer  Tunable data consistency  Flexible schema design  Data Compression  CQL language (like SQL)  Support for key languages and platforms  No need for special hardware or software
  • 22. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 23 Winter 2015 Why Cassandra?  Capable of comfortably scaling to petabytes  New nodes = Linear performance increases  Add new nodes online Big Data Scalability 1 2 Double Throughput Capabilities 1 2 3 4
  • 23. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 24 Winter 2015 Why Cassandra?  All nodes the same  Customized replication affords tunable data redundancy  Read/write from any node  Can replicate data among different physical data center racks No Single Point of Failure
  • 24. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 25 Winter 2015 Why Cassandra?  Peer-to-peer architecture removes need for special caching layer and the programming that goes with it  The database cluster uses the memory from all participating nodes to cache the data assigned to each node  No irregularities between a memory cache and database are encountered No Need for Caching Software Database Server Memcached Servers Application Servers Writes Reads
  • 25. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 26 Winter 2015 Why Cassandra?  Uses Google’s Snappy data compression algorithm  Compresses data on a per column family level  Internal tests at DataStax show up to 80%+ compression of raw data  No performance penalty (and some increases in overall performance due to less physical I/O)! Data Compression Portfolio Keyspace Customer Column Family
  • 26. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 27 Winter 2015 Why Cassandra?  Very similar to RDBMS SQL syntax  Create objects via DDL (e.g. CREATE…)  Core DML commands supported: INSERT, UPDATE, DELETE  Query data with SELECT CQL Language Portfolio Keyspace 1 2 3 4 5 6 SELECT * FROM USERS WHERE STATE = ‘TX’;
  • 27. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 28 Winter 2015 Comparison with MySQL  MySQL > 50 GB Data Writes Average : ~300 ms Reads Average : ~350 ms  Stats provided by Authors using facebook data.  Cassandra > 50 GB Data Writes Average : 0.12 ms Reads Average : 15 ms
  • 28. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 29 Winter 2015 Cassandra Tools ........DesktopnoSqlCassandra-sadeghnoSqlCassandra-sadegh.mp4
  • 29. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 30 Winter 2015 Where to get Cassandra?  Go to www.datastax.com  DataStax makes free smart start installers available for Cassandra that include: • The most up-to-date Cassandra version that is production quality • A version of DataStax OpsCenter, which is a visual, browser- based management tool for managing and monitoring Cassandra • Drivers and connectors for popular development languages • Same database and application • Automatic configuration assistance for ensuring optimal performance and setup for either stand-alone or cluster implementations • Getting Started Guide
  • 30. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 31 Winter 2015 Where Can I Learn More? www.datastax.com  Free Online Documentation  User/Customer Cas Studies  Technical White Papers  Software downloads  Technical Articles  User Forums  Videos  Tutorials  FAQ’s  Blogs
  • 31. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 32 Winter 2015 Resources Sites  Cassandra • http://cassandra.apache.org  NoSQL News websites • http://nosql.mypopescu.com • http://www.nosqldatabases.com  “a practical guide to noSQL”, Posted by Denise Miura on March 17, 2011 at • http://blogs.marklogic.com/2011/03/17/a-practical- guide-to-nosql/
  • 32. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 33 Winter 2015 Resources Books  “Cassandra The Definition Guide”, O'Reilly Media, nov2013  “Cassandra Essential Toturial”, DataStax 2014  “Professional NoSQL”, Wrox, 2011  “NoSQL Distilled”, Martin Fowler, 2013
  • 33. + NoSQL (part 2) - CAP Theorem & Column Oriented 33 34 Winter 2015 Questions