SlideShare a Scribd company logo
NoSQL: Now and Path Ahead
Shubham Kumar Srivastava
MakeMyTrip
Who am I
Abstract


What and Why : NoSql

Fundamentals

Use Case

Challenges

Path Ahead
                        3


.
What is NoSql
Database which does not adhere to the traditional relational database
management system (RDMS) structure .
Why NoSql


 Scalability and Performance


 Cost


 Data Modeling
Why NoSql : Motives and Drivers
                    Scalability and Performance

 Horizontal scalability better than Vertical




 Hardware getting cheaper and processing power increasing




 Less Operational complexity as against RDBMS solutions.




 In most of the solutions you get automatic sharding etc as default .
Why NoSql : Motives and Drivers contd..
Why NoSql : Motives and Drivers contd..
Why NoSql : Motives and Drivers contd..
                     Cost


 Scale(as with NoSql) with Hefty Cost




 Commodity hardware, software versions, upgrades,
  maintenance.



 This brought organizations look out for alternatives and
  the need for a cost effective scale out option.
Why NoSql : Motives and Drivers contd..
                       Data Modeling
SQL has been for



 Concurreny,Consistency,Integrity




 For Summations,Aggregations,Grouping’s




 Schema Says: What all Do I answer ??
Why NoSql : Motives and Drivers contd..
                          Data Modeling

 A plain key-value store is very powerful and fit the max use cases for
  a NoSQL solution


 Hierarchical or graph-like data modelling and processing.


 Values like maps of maps of maps.


 Document Databases which even store arbitrary complex objects.


 Document based indexing data store’s are a huge success.
Why NoSql : Motives and Drivers contd..
At times SW apps are not limited to these constraints . This lead to
data models like


Key/Value Store :
   Redis,MemcacheDb/Voldemort etc.

Wide Column Store / Column Families :
Cassandra/Hadoop(Hbase)/Hypertable/Cloudera etc.


Document Based Store’s :
   Solr/Lucene/MongoDb/CouchDb/TerraStore etc.


Graph Data Store :
   Neo4J/GraphBase/FlockDb etc.
Why NoSql : Motives and Drivers contd..
Why NoSql : Motives and Drivers contd..

   Schema Says: What are the questions


   Data modeling is based on the set of Queries


   Exploit De-normalization Duplication


   Use Aggregates


   Manage Joins with App + Aggregation + DeNormalization etc.
Some Fanda-mentals
                   CAP Theorem

 At the most only two properties of the three in a
 shared/distributed system can be satisfied.

 Consistency

 Availability

 Tolerance to Network Partitions
CAP : Pictorially
Explanation
Use case:
      Scaling Web Apps


Critical fact’s :
• Network outages are common
• Customer shopping carts, email search, social network
  queries—can tolerate stale data


How:
  Compromise on Consistency in-order to remain available vs
  disrupt user service at outages.
Explanation


 Rather than requiring consistency after every transaction, it
  is enough for the database to eventually be in a consistent
  state.




 Brewer’s CAP theorem says you have no choice if you want
  to scale up.
Explanation contd..
Sharp Contrast : High Speed Financial Application

 Highly Transactional

 Consistent

 Automated

 Can’t live with Eventual consistency
ACID vs BASE
                          ACID
 Atomic: Everything in a transaction succeeds or the
  entire transaction is rolled back.

 Consistent: A transaction cannot leave the database in an
  inconsistent state.

 Isolated: Transactions cannot interfere with each other.

 Durable: Completed transactions persist, even when
  servers restart etc.
Some Fanda-mentals cont..
                        BASE
Basic Availability



Soft-state



Eventual consistency
Consistent Hashing
Common way to load balance .

The machine chosen to cache object o will be:

hash(o) mod n
n:total number of machines
Consistent Hashing contd..

      Adding a machine to the cache means
                      hash(o) mod (n + 1)

      Removing a machine to the cache means
                 hash(o) mod (n - 1)


           Result on any above: Disaster 

     Swamped machines with redistribution
Consistent Hashing contd..


 Commonly, a hash function(e.g MD5 hash) will
 map a value into a 128-bit key, 0~2^127-1(or 32 bit
 even as given next) .
Consistent Hashing contd..
Consistent Hashing contd..
         Both Key and Machine hashed with the same function
Consistent Hashing contd..
               Adding a Node
Consistent Hashing contd..
               Removing a Node
Use Case and NoSQL Solution
Problem:

 Need to store bookings per day of all hotels .
 Queries centered around city and regions.

              Hotel count : 1 Million

      Date Range : Now to next 365 *2 Days
NoSQL: Path Ahead

 ACID equivalence(Neo4J,CouchDb etc)

 Transaction Support

 Atomicity

 MVCC
NoSQL: Path Ahead contd..
Possible Solution



Work with SQL Db w.r.t Creation/Updation etc.



Archive the data in NoSQL for query/analysis etc.
NoSQL: Path Ahead contd..
Enterprise Adoption and Challenges

 NoSQL looks good for Unstructured data largely

    SQL is the best choice for a broad range of
 traditional workloads.
NoSQL: Path Ahead contd..
NoSQL: Path Ahead contd..
               Shout out loud



                    Hybrid



                 ACID + BASE

 They are not alternatives but supplements
NoSQL: Path Ahead contd..
 Maturity



 Support



 Skillset and Administration/Operation



 Analytics and BI support
NoSQL: Path Ahead contd..
Q&A
References
 Nancy Lynch and Seth Gilbert, “Brewer's conjecture and the feasibility of consistent, available, partition-
  tolerant web services”, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59.
 Brewer's CAP Theorem", julianbrowne.com, Retrieved 02-Mar-2010
 Brewers CAP theorem on distributed systems", royans.net
 CAP Twelve Years Later: How the "Rules" Have Changed on-line resource
 E. Brewer, "Towards Robust Distributed Systems," Proc. 19th Ann. ACM Symp.Principles of Distributed
  Computing (PODC 00), ACM, 2000, pp. 7-10; on-line resource
 D. Abadi, "Problems with CAP, and Yahoo’s Little Known NoSQL System," DBMS Musings, blog, 23 Apr.
  2010; on-line resource.
 C. Hale, "You Can’t Sacrifice Partition Tolerance," 7 Oct. 2010; on-line resource.
 Facebook: Scaling Out on-line resource.
 Gemstone : The Hardest Problems In Data Management on-line resource
 The Log-Structured Merge-Tree (Research Paper)
 CodeProject : Consistent Hashing on-line resource
 HighlyScalable : NoSQL Data Modeling Techniques on-line resource
 eBay Tech Blog :Cassandra Data Modeling Best Practices on-line resource
 John D Cook : Acid Vs Base on-line resource
 Merkle Trees
 Phy-Accural Faliover Detaection (Research Paper)
Indic threads pune12-nosql now and path ahead
Backup Slides




         Better than the Original 1 
Document Based DataStore
{
    _id : ObjectId("4e77bb3b8a3e000000004f7a"),
    when : Date("2011-09-19T02:10:11.3Z",
    author : "alex",
    title : "No Free Lunch",
    text : "This is the text of the post. It could be very long.",
    tags : [ "business", "ramblings" ],
    votes : 5,
    voters : [ "jane", "joe", "spencer", "phyllis", "li" ],
    comments : [
        { who : "jane", when : Date("2011-09-19T04:00:10.112Z"),
         comment : "I agree." },
        { who : "meghan", when : Date("2011-09-20T14:36:06.958Z"),
         comment : "You must be joking. etc etc ..." }
    ]
}
User and Items
User and Items : Option 1
User and Items : Option 2
User and Items : Option 3
User and Items : Option 4
Cassandra CF
Cassandra SuperCF
Use Case 1
Ecommerce Site

 Problem : Record User Preferences e.g :
  Location,IP,Currency selected, Source of Traffic,
  Multiple other dynamic values

 Solution : In a CF based structure keep it simple

UserId_Key:
 Pref2_Name:Value1,Pref2_Name:Value2,
 ….PrefN_Name:ValueN
Use Case 1
RowKey: 1350136093705_6501082438199894
=> (column=1350136093764, value=-3242432#911167901131523, timestamp=1350136093766000)
=> (column=1350283322499, value=GOI#200701231712126570, timestamp=1350283322502001)
=> (column=1350283566051, value=GOI#200703221605283033, timestamp=1350283566054001)
=> (column=1350749595676, value=GOI#200805261514037199, timestamp=1350749595677001)
    (column=1350785230322, value=BOM#200701251747233158, timestamp=1350785230324001)


⇒    RowKey: 1354499614310_10861558002828044
⇒    => (column=1354499614368, value=TRV#201104071059204768, timestamp=1354499614370000, ttl=1728000)
⇒    -------------------
⇒    RowKey: 1349760150553_6114662943774777
⇒    => (column=1349760152066, value=BLR#200802111324575807, timestamp=1349760152068001)
⇒    -------------------
⇒    RowKey: 1349805109805_6167423558533191
⇒    => (column=1349805111833, value=TRV#312254274337517, timestamp=1349805111835001)
⇒    -------------------
⇒    RowKey: 1354435656227_7908056941568359
⇒    => (column=1354435656367, value=IDR#200701211254519381, timestamp=1354435656369000, ttl=1728000)
⇒    -------------------
⇒    RowKey: 1347648097261_15570089270962881
⇒    => (column=1347648097304, value=DEL#201101192008115545, timestamp=1347648097307000)
Use Case 1
                                                                 Get


private Map<String, String> getPrerences(Keyspace keySpace, String userId, String...
   prefernceNames) throws IOException, CharacterCodingException {
SliceQuery<String, String, String> rsq = HFactory.createSliceQuery(keySpace,
    StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
rsq.setColumnFamily(USER_PREFERENCE);
rsq.setKey(userId);
rsq.setColumnNames(prefernceNames);


QueryResult<ColumnSlice<String, String>> orows = rsq.execute();
Map<String, String> preferenceMap = new LinkedHashMap<String, String>();
for (HColumn<String, String> column : orows.get().getColumns()) {
preferenceMap.put(column.getName(), column.getValue());
}
return preferenceMap;


}
Use Case 1
                                            Save


Mutator<String> m = HFactory.createMutator(keySpace, StringSerializer.get());


HColumn<String, String> userPrefrences = HFactory.createColumn(colkey, colvalue,
  StringSerializer.get(), StringSerializer.get());


userPrefrences.setTtl(ttlUserPrefrences);


m.addInsertion(rowkey, USER_PREFERENCE, userPrefrences);


m.execute();
Use Case 2
Online Travel Site

Problem:      Need to know different metrics for a
           city hotels e.g.:

             Hotels booked in last X Time
             Hotels Last viewed in Y Time
             Hotels Left with Z Inventory
Use Case 2
RowKey: 2d323436353731
=> (super_column=911167901297486,
    (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 23 hour(s) ago.,
     timestamp=1354962852610000)
    column=6c6173747669657765646d657373616762, value=Inventory#20 ,
    timestamp=1354962852610000,
     column=6c6173747669657765646d657373616769, value=Bookings#8 , timestamp=135496282610000
)
-------------------
RowKey: 58524f
=> (super_column=200903041759196196,
    (column=6c617374626f6f6b65646d657373616765, value=Booked#Last booked 1 day(s) ago.,
     timestamp=1347781187842000)
    (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 2 hours ago.,
     timestamp=1347707080147000))
=> (super_column=200903041848352230,
    (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 1 day(s) ago.,
     timestamp=1347266107708000))
Use Case 2
SuperSliceQuery<String, String, String, String> superQuery = HFactory.createSuperSliceQuery(getKeySpace(),
StringSerializer.get(), StringSerializer.get(),
StringSerializer.get(), StringSerializer.get());
superQuery.setColumnFamily(SUPER_SOCIAL_MESSAGE).setKey(cityCode);


QueryResult<SuperSlice<String, String, String>> result = superQuery.execute();
List<HSuperColumn<String, String, String>> superColumns = result.get().getSuperColumns();


if (superColumns != null) {
for (HSuperColumn<String, String, String> superColumn : superColumns) {
Map<String, String> messages = new HashMap<String, String>();
List<HColumn<String, String>> columns = superColumn.getColumns();
if (columns != null) {
for (HColumn<String, String> column : columns) {
messages.put(column.getName(), column.getValue());
}
}
/* The equivalent doc *
document.addField(superColumn.getName(), messages);
documents.add(document);
}
}
Pig Script : MR
<document>

  <pigscript start="-16" end="-43200" start1="-1441" end1="-10080" start2="0" end2="-15" start3="0" end3="-1440">

     <comment>Delete All Messages</comment>

      <query><![CDATA[rows0 = LOAD 'cassandra://LH/HotelMessage' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } );]]></query>

       <query><![CDATA[cols0 = FOREACH rows0 GENERATE key as key,flatten($1) as (name:chararray, value:chararray);]]></query>

       <query><![CDATA[cols0 = FOREACH rows0 GENERATE key as key,flatten($1) as (name:chararray, value:chararray);]]></query>

      <query><![CDATA[userhotel0 = FOREACH cols0 GENERATE key as key,com.mmt.solr.hotels.cassandra.ByteBufferToString($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query>

      <query><![CDATA[uriCounts0 = FOREACH userhotel0 GENERATE key as citycode,com.mmt.solr.hotels.cassandra.ToBag(TOTUPLE(name,null));]]></query>




       <comment>Last Viewed start 15 minutes to 30 days ago</comment>

      <query><![CDATA[rows = LOAD 'cassandra://LH/LastViewedHotels?slice_start=#start&slice_end=#end&limit=1024&reversed=true' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:long,
        value:chararray) } );]]></query>

      <query><![CDATA[cols = FOREACH rows GENERATE key as key,flatten($1) as (name:long, value:chararray);]]></query>

  <query><![CDATA[userhotel = FOREACH cols GENERATE key as key,com.mmt.solr.hotels.cassandra.LongToHours($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query>

  <query><![CDATA[userhotelByCity = FOREACH userhotel GENERATE key as key,flatten($1) as name,flatten(org.apache.pig.piggybank.evaluation.string.Split(value,'#',2)) as (citycode:chararray,hotelid:chararray);]]></query>

  <query><![CDATA[groupByhotels = GROUP userhotelByCity BY hotelid;]]></query>

  <query><![CDATA[uriCounts = FOREACH groupByhotels { D = LIMIT userhotelByCity 1;

                                   GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag(

                                   TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('VIEWED#Last viewed ',D.name,' ago.')));

                                  };]]></query>



      <comment>Last Booked 1 to 8 days ago</comment>

      <query><![CDATA[rows1 = LOAD 'cassandra://LH/BookedHotels?slice_start=#startA&slice_end=#endA&limit=1024&reversed=true' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:long,
         value:chararray) } );]]></query>

 <query><![CDATA[cols1 = FOREACH rows1 GENERATE key as key,flatten($1) as (name:long, value:chararray);]]></query>

 <query><![CDATA[userhotel1 = FOREACH cols1 GENERATE key as key,com.mmt.solr.hotels.cassandra.LongToHours($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query>

 <query><![CDATA[userhotelByCity1 = FOREACH userhotel1 GENERATE key as key,flatten($1) as name,flatten(org.apache.pig.piggybank.evaluation.string.Split(value,'#',2)) as (citycode:chararray,hotelid:chararray);]]></query>

 <query><![CDATA[groupByhotels1 = GROUP userhotelByCity1 BY hotelid;]]></query>

 <query><![CDATA[uriCounts1 = FOREACH groupByhotels1 { D = LIMIT userhotelByCity1 1;



GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag(

TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('Booked#Last booked ',D.name,' ago.')));

};]]></query>
Criteria's to Evaluate NoSQL Solutions

Internal partitioning

Automated flexible data distribution

Hot swappable nodes

Replication-style

Automated failover strategy

More Related Content

What's hot (19)

PPTX
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
ODP
Nonrelational Databases
Udi Bauman
 
PDF
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
Insight Technology, Inc.
 
PPTX
NOSQL
akbarashaikh
 
PDF
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Darshan Gorasiya
 
PPTX
Non relational databases-no sql
Ram kumar
 
PDF
Non Relational Databases
Chris Baglieri
 
KEY
NoSQL databases and managing big data
Steven Francia
 
PDF
Performance analysis of MongoDB and HBase
SindhujanDhayalan
 
PPTX
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 
PPT
Webinar: High Performance MongoDB Applications with IBM POWER8
MongoDB
 
PPTX
Nosql seminar
Shreyashkumar Nangnurwar
 
PDF
Beyond Relational Databases
Gregory Boissinot
 
PPTX
Nosql databases
ateeq ateeq
 
PPTX
Relational and non relational database 7
abdulrahmanhelan
 
PPTX
Introduction To MongoDB
ElieHannouch
 
PDF
Webinar: Faster Big Data Analytics with MongoDB
MongoDB
 
PPT
Schemaless Databases
Dan Gunter
 
PDF
NoSQL-Database-Concepts
Bhaskar Gunda
 
Navigating NoSQL in cloudy skies
shnkr_rmchndrn
 
Nonrelational Databases
Udi Bauman
 
[db tech showcase Tokyo 2017] C34: Replacing Oracle Database at DBS Bank ~Ora...
Insight Technology, Inc.
 
Quantitative Performance Evaluation of Cloud-Based MySQL (Relational) Vs. Mon...
Darshan Gorasiya
 
Non relational databases-no sql
Ram kumar
 
Non Relational Databases
Chris Baglieri
 
NoSQL databases and managing big data
Steven Francia
 
Performance analysis of MongoDB and HBase
SindhujanDhayalan
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Felix Gessert
 
Webinar: High Performance MongoDB Applications with IBM POWER8
MongoDB
 
Beyond Relational Databases
Gregory Boissinot
 
Nosql databases
ateeq ateeq
 
Relational and non relational database 7
abdulrahmanhelan
 
Introduction To MongoDB
ElieHannouch
 
Webinar: Faster Big Data Analytics with MongoDB
MongoDB
 
Schemaless Databases
Dan Gunter
 
NoSQL-Database-Concepts
Bhaskar Gunda
 

Similar to Indic threads pune12-nosql now and path ahead (20)

PPTX
Introduction to NoSQL Database
Mohammad Alghanem
 
PDF
If NoSQL is your answer, you are probably asking the wrong question.
Lukas Smith
 
PPT
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
PPT
NoSql Databases
Nimat Khattak
 
ODP
Databases benoitg 2009-03-10
benoitg
 
PPTX
MinneBar 2013 - Scaling with Cassandra
Jeff Smoley
 
PPTX
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Arseny Chernov
 
PPTX
SPL_ALL_EN.pptx
政宏 张
 
PDF
NoSQL Basics - A Quick Tour
Bikram Sinha. MBA, PMP
 
PPTX
At the core you will have KUSTO
Riccardo Zamana
 
PPTX
http://www.hfadeel.com/Blog/?p=151
xlight
 
PPTX
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Mohamed Galal
 
PPTX
Relational databases vs Non-relational databases
James Serra
 
PDF
NoSQL Solutions - a comparative study
Guillaume Lefranc
 
PPT
Enterprise NoSQL: Silver Bullet or Poison Pill
Billy Newport
 
PPTX
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
PPTX
SQL to NoSQL: Top 6 Questions
Mike Broberg
 
PDF
Avoiding big data antipatterns
grepalex
 
PPTX
NoSQLDatabases
Adi Challa
 
PPTX
مقدمة عن NoSQL بالعربي
Mohamed Galal
 
Introduction to NoSQL Database
Mohammad Alghanem
 
If NoSQL is your answer, you are probably asking the wrong question.
Lukas Smith
 
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
NoSql Databases
Nimat Khattak
 
Databases benoitg 2009-03-10
benoitg
 
MinneBar 2013 - Scaling with Cassandra
Jeff Smoley
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Arseny Chernov
 
SPL_ALL_EN.pptx
政宏 张
 
NoSQL Basics - A Quick Tour
Bikram Sinha. MBA, PMP
 
At the core you will have KUSTO
Riccardo Zamana
 
http://www.hfadeel.com/Blog/?p=151
xlight
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Mohamed Galal
 
Relational databases vs Non-relational databases
James Serra
 
NoSQL Solutions - a comparative study
Guillaume Lefranc
 
Enterprise NoSQL: Silver Bullet or Poison Pill
Billy Newport
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
SQL to NoSQL: Top 6 Questions
Mike Broberg
 
Avoiding big data antipatterns
grepalex
 
NoSQLDatabases
Adi Challa
 
مقدمة عن NoSQL بالعربي
Mohamed Galal
 
Ad

More from IndicThreads (20)

PPTX
Http2 is here! And why the web needs it
IndicThreads
 
ODP
Understanding Bitcoin (Blockchain) and its Potential for Disruptive Applications
IndicThreads
 
PPT
Go Programming Language - Learning The Go Lang way
IndicThreads
 
PPT
Building Resilient Microservices
IndicThreads
 
PPT
App using golang indicthreads
IndicThreads
 
PDF
Building on quicksand microservices indicthreads
IndicThreads
 
PDF
How to Think in RxJava Before Reacting
IndicThreads
 
PPT
Iot secure connected devices indicthreads
IndicThreads
 
PDF
Real world IoT for enterprises
IndicThreads
 
PPT
IoT testing and quality assurance indicthreads
IndicThreads
 
PPT
Functional Programming Past Present Future
IndicThreads
 
PDF
Harnessing the Power of Java 8 Streams
IndicThreads
 
PDF
Building & scaling a live streaming mobile platform - Gr8 road to fame
IndicThreads
 
PPTX
Internet of things architecture perspective - IndicThreads Conference
IndicThreads
 
PDF
Cars and Computers: Building a Java Carputer
IndicThreads
 
PPTX
Scrap Your MapReduce - Apache Spark
IndicThreads
 
PPT
Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
IndicThreads
 
PPTX
Speed up your build pipeline for faster feedback
IndicThreads
 
PPT
Unraveling OpenStack Clouds
IndicThreads
 
PPTX
Digital Transformation of the Enterprise. What IT leaders need to know!
IndicThreads
 
Http2 is here! And why the web needs it
IndicThreads
 
Understanding Bitcoin (Blockchain) and its Potential for Disruptive Applications
IndicThreads
 
Go Programming Language - Learning The Go Lang way
IndicThreads
 
Building Resilient Microservices
IndicThreads
 
App using golang indicthreads
IndicThreads
 
Building on quicksand microservices indicthreads
IndicThreads
 
How to Think in RxJava Before Reacting
IndicThreads
 
Iot secure connected devices indicthreads
IndicThreads
 
Real world IoT for enterprises
IndicThreads
 
IoT testing and quality assurance indicthreads
IndicThreads
 
Functional Programming Past Present Future
IndicThreads
 
Harnessing the Power of Java 8 Streams
IndicThreads
 
Building & scaling a live streaming mobile platform - Gr8 road to fame
IndicThreads
 
Internet of things architecture perspective - IndicThreads Conference
IndicThreads
 
Cars and Computers: Building a Java Carputer
IndicThreads
 
Scrap Your MapReduce - Apache Spark
IndicThreads
 
Continuous Integration (CI) and Continuous Delivery (CD) using Jenkins & Docker
IndicThreads
 
Speed up your build pipeline for faster feedback
IndicThreads
 
Unraveling OpenStack Clouds
IndicThreads
 
Digital Transformation of the Enterprise. What IT leaders need to know!
IndicThreads
 
Ad

Recently uploaded (20)

PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
The Future of Artificial Intelligence (AI)
Mukul
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 

Indic threads pune12-nosql now and path ahead

  • 1. NoSQL: Now and Path Ahead Shubham Kumar Srivastava MakeMyTrip
  • 3. Abstract What and Why : NoSql Fundamentals Use Case Challenges Path Ahead 3 .
  • 4. What is NoSql Database which does not adhere to the traditional relational database management system (RDMS) structure .
  • 5. Why NoSql  Scalability and Performance  Cost  Data Modeling
  • 6. Why NoSql : Motives and Drivers Scalability and Performance  Horizontal scalability better than Vertical  Hardware getting cheaper and processing power increasing  Less Operational complexity as against RDBMS solutions.  In most of the solutions you get automatic sharding etc as default .
  • 7. Why NoSql : Motives and Drivers contd..
  • 8. Why NoSql : Motives and Drivers contd..
  • 9. Why NoSql : Motives and Drivers contd.. Cost  Scale(as with NoSql) with Hefty Cost  Commodity hardware, software versions, upgrades, maintenance.  This brought organizations look out for alternatives and the need for a cost effective scale out option.
  • 10. Why NoSql : Motives and Drivers contd.. Data Modeling SQL has been for  Concurreny,Consistency,Integrity  For Summations,Aggregations,Grouping’s  Schema Says: What all Do I answer ??
  • 11. Why NoSql : Motives and Drivers contd.. Data Modeling  A plain key-value store is very powerful and fit the max use cases for a NoSQL solution  Hierarchical or graph-like data modelling and processing.  Values like maps of maps of maps.  Document Databases which even store arbitrary complex objects.  Document based indexing data store’s are a huge success.
  • 12. Why NoSql : Motives and Drivers contd.. At times SW apps are not limited to these constraints . This lead to data models like Key/Value Store : Redis,MemcacheDb/Voldemort etc. Wide Column Store / Column Families : Cassandra/Hadoop(Hbase)/Hypertable/Cloudera etc. Document Based Store’s : Solr/Lucene/MongoDb/CouchDb/TerraStore etc. Graph Data Store : Neo4J/GraphBase/FlockDb etc.
  • 13. Why NoSql : Motives and Drivers contd..
  • 14. Why NoSql : Motives and Drivers contd..  Schema Says: What are the questions  Data modeling is based on the set of Queries  Exploit De-normalization Duplication  Use Aggregates  Manage Joins with App + Aggregation + DeNormalization etc.
  • 15. Some Fanda-mentals CAP Theorem At the most only two properties of the three in a shared/distributed system can be satisfied.  Consistency  Availability  Tolerance to Network Partitions
  • 17. Explanation Use case: Scaling Web Apps Critical fact’s : • Network outages are common • Customer shopping carts, email search, social network queries—can tolerate stale data How: Compromise on Consistency in-order to remain available vs disrupt user service at outages.
  • 18. Explanation  Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state.  Brewer’s CAP theorem says you have no choice if you want to scale up.
  • 19. Explanation contd.. Sharp Contrast : High Speed Financial Application  Highly Transactional  Consistent  Automated  Can’t live with Eventual consistency
  • 20. ACID vs BASE ACID  Atomic: Everything in a transaction succeeds or the entire transaction is rolled back.  Consistent: A transaction cannot leave the database in an inconsistent state.  Isolated: Transactions cannot interfere with each other.  Durable: Completed transactions persist, even when servers restart etc.
  • 21. Some Fanda-mentals cont.. BASE Basic Availability Soft-state Eventual consistency
  • 22. Consistent Hashing Common way to load balance . The machine chosen to cache object o will be: hash(o) mod n n:total number of machines
  • 23. Consistent Hashing contd..  Adding a machine to the cache means hash(o) mod (n + 1)  Removing a machine to the cache means hash(o) mod (n - 1)  Result on any above: Disaster  Swamped machines with redistribution
  • 24. Consistent Hashing contd.. Commonly, a hash function(e.g MD5 hash) will map a value into a 128-bit key, 0~2^127-1(or 32 bit even as given next) .
  • 26. Consistent Hashing contd.. Both Key and Machine hashed with the same function
  • 27. Consistent Hashing contd.. Adding a Node
  • 28. Consistent Hashing contd.. Removing a Node
  • 29. Use Case and NoSQL Solution Problem: Need to store bookings per day of all hotels . Queries centered around city and regions. Hotel count : 1 Million Date Range : Now to next 365 *2 Days
  • 30. NoSQL: Path Ahead  ACID equivalence(Neo4J,CouchDb etc)  Transaction Support  Atomicity  MVCC
  • 31. NoSQL: Path Ahead contd.. Possible Solution Work with SQL Db w.r.t Creation/Updation etc. Archive the data in NoSQL for query/analysis etc.
  • 32. NoSQL: Path Ahead contd.. Enterprise Adoption and Challenges  NoSQL looks good for Unstructured data largely  SQL is the best choice for a broad range of traditional workloads.
  • 33. NoSQL: Path Ahead contd..
  • 34. NoSQL: Path Ahead contd.. Shout out loud Hybrid ACID + BASE They are not alternatives but supplements
  • 35. NoSQL: Path Ahead contd..  Maturity  Support  Skillset and Administration/Operation  Analytics and BI support
  • 36. NoSQL: Path Ahead contd..
  • 37. Q&A
  • 38. References  Nancy Lynch and Seth Gilbert, “Brewer's conjecture and the feasibility of consistent, available, partition- tolerant web services”, ACM SIGACT News, Volume 33 Issue 2 (2002), pg. 51-59.  Brewer's CAP Theorem", julianbrowne.com, Retrieved 02-Mar-2010  Brewers CAP theorem on distributed systems", royans.net  CAP Twelve Years Later: How the "Rules" Have Changed on-line resource  E. Brewer, "Towards Robust Distributed Systems," Proc. 19th Ann. ACM Symp.Principles of Distributed Computing (PODC 00), ACM, 2000, pp. 7-10; on-line resource  D. Abadi, "Problems with CAP, and Yahoo’s Little Known NoSQL System," DBMS Musings, blog, 23 Apr. 2010; on-line resource.  C. Hale, "You Can’t Sacrifice Partition Tolerance," 7 Oct. 2010; on-line resource.  Facebook: Scaling Out on-line resource.  Gemstone : The Hardest Problems In Data Management on-line resource  The Log-Structured Merge-Tree (Research Paper)  CodeProject : Consistent Hashing on-line resource  HighlyScalable : NoSQL Data Modeling Techniques on-line resource  eBay Tech Blog :Cassandra Data Modeling Best Practices on-line resource  John D Cook : Acid Vs Base on-line resource  Merkle Trees  Phy-Accural Faliover Detaection (Research Paper)
  • 40. Backup Slides Better than the Original 1 
  • 41. Document Based DataStore { _id : ObjectId("4e77bb3b8a3e000000004f7a"), when : Date("2011-09-19T02:10:11.3Z", author : "alex", title : "No Free Lunch", text : "This is the text of the post. It could be very long.", tags : [ "business", "ramblings" ], votes : 5, voters : [ "jane", "joe", "spencer", "phyllis", "li" ], comments : [ { who : "jane", when : Date("2011-09-19T04:00:10.112Z"), comment : "I agree." }, { who : "meghan", when : Date("2011-09-20T14:36:06.958Z"), comment : "You must be joking. etc etc ..." } ] }
  • 43. User and Items : Option 1
  • 44. User and Items : Option 2
  • 45. User and Items : Option 3
  • 46. User and Items : Option 4
  • 49. Use Case 1 Ecommerce Site Problem : Record User Preferences e.g : Location,IP,Currency selected, Source of Traffic, Multiple other dynamic values Solution : In a CF based structure keep it simple UserId_Key: Pref2_Name:Value1,Pref2_Name:Value2, ….PrefN_Name:ValueN
  • 50. Use Case 1 RowKey: 1350136093705_6501082438199894 => (column=1350136093764, value=-3242432#911167901131523, timestamp=1350136093766000) => (column=1350283322499, value=GOI#200701231712126570, timestamp=1350283322502001) => (column=1350283566051, value=GOI#200703221605283033, timestamp=1350283566054001) => (column=1350749595676, value=GOI#200805261514037199, timestamp=1350749595677001) (column=1350785230322, value=BOM#200701251747233158, timestamp=1350785230324001) ⇒ RowKey: 1354499614310_10861558002828044 ⇒ => (column=1354499614368, value=TRV#201104071059204768, timestamp=1354499614370000, ttl=1728000) ⇒ ------------------- ⇒ RowKey: 1349760150553_6114662943774777 ⇒ => (column=1349760152066, value=BLR#200802111324575807, timestamp=1349760152068001) ⇒ ------------------- ⇒ RowKey: 1349805109805_6167423558533191 ⇒ => (column=1349805111833, value=TRV#312254274337517, timestamp=1349805111835001) ⇒ ------------------- ⇒ RowKey: 1354435656227_7908056941568359 ⇒ => (column=1354435656367, value=IDR#200701211254519381, timestamp=1354435656369000, ttl=1728000) ⇒ ------------------- ⇒ RowKey: 1347648097261_15570089270962881 ⇒ => (column=1347648097304, value=DEL#201101192008115545, timestamp=1347648097307000)
  • 51. Use Case 1 Get private Map<String, String> getPrerences(Keyspace keySpace, String userId, String... prefernceNames) throws IOException, CharacterCodingException { SliceQuery<String, String, String> rsq = HFactory.createSliceQuery(keySpace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); rsq.setColumnFamily(USER_PREFERENCE); rsq.setKey(userId); rsq.setColumnNames(prefernceNames); QueryResult<ColumnSlice<String, String>> orows = rsq.execute(); Map<String, String> preferenceMap = new LinkedHashMap<String, String>(); for (HColumn<String, String> column : orows.get().getColumns()) { preferenceMap.put(column.getName(), column.getValue()); } return preferenceMap; }
  • 52. Use Case 1 Save Mutator<String> m = HFactory.createMutator(keySpace, StringSerializer.get()); HColumn<String, String> userPrefrences = HFactory.createColumn(colkey, colvalue, StringSerializer.get(), StringSerializer.get()); userPrefrences.setTtl(ttlUserPrefrences); m.addInsertion(rowkey, USER_PREFERENCE, userPrefrences); m.execute();
  • 53. Use Case 2 Online Travel Site Problem: Need to know different metrics for a city hotels e.g.: Hotels booked in last X Time Hotels Last viewed in Y Time Hotels Left with Z Inventory
  • 54. Use Case 2 RowKey: 2d323436353731 => (super_column=911167901297486, (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 23 hour(s) ago., timestamp=1354962852610000) column=6c6173747669657765646d657373616762, value=Inventory#20 , timestamp=1354962852610000, column=6c6173747669657765646d657373616769, value=Bookings#8 , timestamp=135496282610000 ) ------------------- RowKey: 58524f => (super_column=200903041759196196, (column=6c617374626f6f6b65646d657373616765, value=Booked#Last booked 1 day(s) ago., timestamp=1347781187842000) (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 2 hours ago., timestamp=1347707080147000)) => (super_column=200903041848352230, (column=6c6173747669657765646d657373616765, value=VIEWED#Last viewed 1 day(s) ago., timestamp=1347266107708000))
  • 55. Use Case 2 SuperSliceQuery<String, String, String, String> superQuery = HFactory.createSuperSliceQuery(getKeySpace(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); superQuery.setColumnFamily(SUPER_SOCIAL_MESSAGE).setKey(cityCode); QueryResult<SuperSlice<String, String, String>> result = superQuery.execute(); List<HSuperColumn<String, String, String>> superColumns = result.get().getSuperColumns(); if (superColumns != null) { for (HSuperColumn<String, String, String> superColumn : superColumns) { Map<String, String> messages = new HashMap<String, String>(); List<HColumn<String, String>> columns = superColumn.getColumns(); if (columns != null) { for (HColumn<String, String> column : columns) { messages.put(column.getName(), column.getValue()); } } /* The equivalent doc * document.addField(superColumn.getName(), messages); documents.add(document); } }
  • 56. Pig Script : MR <document> <pigscript start="-16" end="-43200" start1="-1441" end1="-10080" start2="0" end2="-15" start3="0" end3="-1440"> <comment>Delete All Messages</comment> <query><![CDATA[rows0 = LOAD 'cassandra://LH/HotelMessage' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } );]]></query> <query><![CDATA[cols0 = FOREACH rows0 GENERATE key as key,flatten($1) as (name:chararray, value:chararray);]]></query> <query><![CDATA[cols0 = FOREACH rows0 GENERATE key as key,flatten($1) as (name:chararray, value:chararray);]]></query> <query><![CDATA[userhotel0 = FOREACH cols0 GENERATE key as key,com.mmt.solr.hotels.cassandra.ByteBufferToString($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query> <query><![CDATA[uriCounts0 = FOREACH userhotel0 GENERATE key as citycode,com.mmt.solr.hotels.cassandra.ToBag(TOTUPLE(name,null));]]></query> <comment>Last Viewed start 15 minutes to 30 days ago</comment> <query><![CDATA[rows = LOAD 'cassandra://LH/LastViewedHotels?slice_start=#start&slice_end=#end&limit=1024&reversed=true' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:long, value:chararray) } );]]></query> <query><![CDATA[cols = FOREACH rows GENERATE key as key,flatten($1) as (name:long, value:chararray);]]></query> <query><![CDATA[userhotel = FOREACH cols GENERATE key as key,com.mmt.solr.hotels.cassandra.LongToHours($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query> <query><![CDATA[userhotelByCity = FOREACH userhotel GENERATE key as key,flatten($1) as name,flatten(org.apache.pig.piggybank.evaluation.string.Split(value,'#',2)) as (citycode:chararray,hotelid:chararray);]]></query> <query><![CDATA[groupByhotels = GROUP userhotelByCity BY hotelid;]]></query> <query><![CDATA[uriCounts = FOREACH groupByhotels { D = LIMIT userhotelByCity 1; GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag( TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('VIEWED#Last viewed ',D.name,' ago.'))); };]]></query> <comment>Last Booked 1 to 8 days ago</comment> <query><![CDATA[rows1 = LOAD 'cassandra://LH/BookedHotels?slice_start=#startA&slice_end=#endA&limit=1024&reversed=true' USING com.mmt.solr.hotels.cassandra.CassandraStorage() as (key:chararray, cols:bag{T:tuple(name:long, value:chararray) } );]]></query> <query><![CDATA[cols1 = FOREACH rows1 GENERATE key as key,flatten($1) as (name:long, value:chararray);]]></query> <query><![CDATA[userhotel1 = FOREACH cols1 GENERATE key as key,com.mmt.solr.hotels.cassandra.LongToHours($1) as name,com.mmt.solr.hotels.cassandra.ByteBufferToString($2) as value;]]></query> <query><![CDATA[userhotelByCity1 = FOREACH userhotel1 GENERATE key as key,flatten($1) as name,flatten(org.apache.pig.piggybank.evaluation.string.Split(value,'#',2)) as (citycode:chararray,hotelid:chararray);]]></query> <query><![CDATA[groupByhotels1 = GROUP userhotelByCity1 BY hotelid;]]></query> <query><![CDATA[uriCounts1 = FOREACH groupByhotels1 { D = LIMIT userhotelByCity1 1; GENERATE flatten(D.citycode) as citycode,com.mmt.solr.hotels.cassandra.ToBag( TOTUPLE(group,com.mmt.solr.hotels.cassandra.StringAppend('Booked#Last booked ',D.name,' ago.'))); };]]></query>
  • 57. Criteria's to Evaluate NoSQL Solutions Internal partitioning Automated flexible data distribution Hot swappable nodes Replication-style Automated failover strategy