SlideShare a Scribd company logo
AND
       Products
      comparison
Technical overview

Programming language                     C++                                           Java
Language Bindings & Clients              C, C++, Erlang, Haskell, Java, JavaScript, .NE Java, Jython ,Groovy DSL , Scala, REST
                                         T (C#
                                         F#, PowerShell, etc), Perl, PHP, Python, Rub
                                         y, Scala.
Protocols                                Mongo Wire Protocol                             Apache Avro, Thrift, REST

First public release and current state   Feb 2009    Last release 2.0.2          14th Jul 2010 Last release 0.92.0 23th January
                                         December 2011                                2012
Technical overview

Querying              Mongo Query Language          Filter Language
Atomicity              Conditional                   +
Consistency           +                             +
Isolation             -                             +
Durability            +                             -
                                                    Periodic-Update Secondary Index
                      Indexing of embedded element, Filter Query
Secondary Indexes
                      compound key                  Dual-Write Secondary Index
                                                    Summary Tables
Map/Reduce            Supports
Sharding              +                             +
Replication           +                             +
Revision control       -                            +
MongoDB features

Document-oriented
Capped Collections
Greed FS
Indexing
Map Reduce
Query language
JSON/BSON
Eventually-consistence
HBase features
•   Column oriented(after Google big table)
•   Bloom filters on per column basis
•   MapReduce
•   Secondary Indexes
•   HDFS based
•   Revision controll
MongoDB configuration
     example
MongoDB and Apache HBase: Benchmarking
HBase configuration
MongoDB use cases
Git Hub : the social coding site, is using MongoDB
for an internal reporting application.
РосГос затраты: RosSpending is the first Russian
public spending monitoring project..
Disney: common set of tools and APIs for all games
within the Interactive Media Group, using
MongoDB as a common object repository to persist
state information.
Over 300 of companies have prodact deployments
of mongoDB
HBase use cases
Facebook : Real-Time messaging
Over 152 billions messages monthly
Adobe: 30 nodes social services ,data and
processing for internal use.
Explorys: over a billion anonymized clinical
records
Mozilla Socorro : Crash reporting system
Powered by about 40 companies
Benchmarking
• Enveroment: Amazon Elastic compute cloud.
• Testing tool – Yahoo Cloud Service
  benchmark(YCSB)
2000.00
                         4000.00
                                   6000.00
                                             8000.00
                                                                          10000.00
                                                                                                        12000.00




        0.00
    0
 1200
 2400
 3600
 4800
 6000
 7200
 8400
 9600
10800
12000
13200
14400
15600
16800
18000
19200
20400
                                                                                                                   12hours of loading.




21600
22800
24000
25200
                                                                                                                   167.600.000 for 4 shards




26400
27600
28800
30000
31200
32400
                                                                                                                   95.000.000 records for 2 shards




33600
34800
36000
37200
38400
                                                                                                                                                     MongoDB Benchmarking.




39600
40800
                                                       2 shards loading
                                                                                     4 shards loading




42000
43200
44400
50% reads 50% updates


12000


10000


 8000
                                                 2 shards update
                                                 2 shards read
 6000
                                                 4 shards update
                                                 4 shards read
 4000


 2000


    0
        500   1000   2000   3000   3300   5000
95% reads 5% updates


16000

14000

12000

10000                                    2 shards update
 8000                                    2 shards read
                                         4 shards update
 6000
                                         4 shards read
 4000

 2000

    0
        500   1000 2000 3000 4000 5000
Read only performance


10000
 9000
 8000
 7000
 6000
 5000                                            2 shards
 4000                                            4 shards
 3000
 2000
 1000
    0
        500   1000   2000   3000   4000   5000
Read Insert performance



250000


200000


150000                     2 shards insert
                           2 shards read
100000                     4 shards insert
                           4 shards read
 50000


     0
         200   300   400
Questions



skype: google_mic
mailto: mikhail.hul@gmail.com
mailto: mikhail.hul@altoros.com

More Related Content

Viewers also liked (19)

PDF
Introduction to solr
Sematext Group, Inc.
 
PDF
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
Sematext Group, Inc.
 
PPTX
Large scale near real-time log indexing with Flume and SolrCloud
DataWorks Summit
 
PDF
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
 
PDF
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Sematext Group, Inc.
 
PDF
Elasticsearch for Logs & Metrics - a deep dive
Sematext Group, Inc.
 
PPTX
Tuning Elasticsearch Indexing Pipeline for Logs
Sematext Group, Inc.
 
PDF
Side by Side with Elasticsearch & Solr, Part 2
Sematext Group, Inc.
 
PPTX
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
DataWorks Summit/Hadoop Summit
 
PDF
Monitoring and Log Management for
Sematext Group, Inc.
 
PDF
(Elastic)search in big data
Sematext Group, Inc.
 
PDF
How to Run Solr on Docker and Why
Sematext Group, Inc.
 
PDF
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Sematext Group, Inc.
 
PDF
Improvements to Apache HBase and Its Applications in Alibaba Search
HBaseCon
 
PPTX
A Survey of HBase Application Archetypes
HBaseCon
 
PPTX
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
 
PPTX
Feb 2013 HUG: Large Scale Data Ingest Using Apache Flume
Yahoo Developer Network
 
PDF
Flume-Cassandra Log Processor
CLOUDIAN KK
 
PDF
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
Inhacking
 
Introduction to solr
Sematext Group, Inc.
 
From Zero to Hero - Centralized Logging with Logstash & Elasticsearch
Sematext Group, Inc.
 
Large scale near real-time log indexing with Flume and SolrCloud
DataWorks Summit
 
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Sematext Group, Inc.
 
Elasticsearch for Logs & Metrics - a deep dive
Sematext Group, Inc.
 
Tuning Elasticsearch Indexing Pipeline for Logs
Sematext Group, Inc.
 
Side by Side with Elasticsearch & Solr, Part 2
Sematext Group, Inc.
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
DataWorks Summit/Hadoop Summit
 
Monitoring and Log Management for
Sematext Group, Inc.
 
(Elastic)search in big data
Sematext Group, Inc.
 
How to Run Solr on Docker and Why
Sematext Group, Inc.
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Sematext Group, Inc.
 
Improvements to Apache HBase and Its Applications in Alibaba Search
HBaseCon
 
A Survey of HBase Application Archetypes
HBaseCon
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
 
Feb 2013 HUG: Large Scale Data Ingest Using Apache Flume
Yahoo Developer Network
 
Flume-Cassandra Log Processor
CLOUDIAN KK
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
Inhacking
 

Similar to MongoDB and Apache HBase: Benchmarking (20)

PDF
How to Create a High-Speed Template Engine in Python
kwatch
 
PPTX
(ATS3-PLAT01) Recent developments in Pipeline Pilot
BIOVIA
 
PDF
IHC 2011 - Widgets Internship
Eduardo Oliveira
 
PDF
SDS Amazon RDS
Roger Rafanell Mas
 
PDF
White Paper: xDesign Online Editor & API Performance Benchmark Summary
EMC
 
PPTX
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
CUBRID
 
KEY
Running a Lean Startup with AWS - Spreaker Case Study
Marco Pracucci
 
PDF
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Enkitec
 
PDF
Dirty - How simple is your database?
Felix Geisendörfer
 
PPTX
Xen.org: The past, the present and exciting Future
The Linux Foundation
 
PPTX
Use Distributed Filesystem as a Storage Tier
Manfred Furuholmen
 
PDF
NPW2009 - my.opera.com scalability v2.0
Cosimo Streppone
 
PPTX
CDNetworks Reaching China with Your Website and Brand - The Hard Truth
CDNetworks
 
PDF
Varnish, The Good, The Awesome, and the Downright Crazy.
Mike Willbanks
 
PDF
Varnish, The Good, The Awesome, and the Downright Crazy
Mike Willbanks
 
PPTX
Lean principles and practices
Jelle Bens
 
PDF
Virtual Box Aquarium May09
Eduardo Pelegri-Llopart
 
PPTX
Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)
Ontico
 
PDF
A Function by Any Other Name is a Function
Jason Strate
 
PDF
The 5 Stages of Scale
xcbsmith
 
How to Create a High-Speed Template Engine in Python
kwatch
 
(ATS3-PLAT01) Recent developments in Pipeline Pilot
BIOVIA
 
IHC 2011 - Widgets Internship
Eduardo Oliveira
 
SDS Amazon RDS
Roger Rafanell Mas
 
White Paper: xDesign Online Editor & API Performance Benchmark Summary
EMC
 
Database Sharding the Right Way: Easy, Reliable, and Open source - HighLoad++...
CUBRID
 
Running a Lean Startup with AWS - Spreaker Case Study
Marco Pracucci
 
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Enkitec
 
Dirty - How simple is your database?
Felix Geisendörfer
 
Xen.org: The past, the present and exciting Future
The Linux Foundation
 
Use Distributed Filesystem as a Storage Tier
Manfred Furuholmen
 
NPW2009 - my.opera.com scalability v2.0
Cosimo Streppone
 
CDNetworks Reaching China with Your Website and Brand - The Hard Truth
CDNetworks
 
Varnish, The Good, The Awesome, and the Downright Crazy.
Mike Willbanks
 
Varnish, The Good, The Awesome, and the Downright Crazy
Mike Willbanks
 
Lean principles and practices
Jelle Bens
 
Virtual Box Aquarium May09
Eduardo Pelegri-Llopart
 
Database sharding the right way: еasy, reliable, and open source (Esen Sagynov)
Ontico
 
A Function by Any Other Name is a Function
Jason Strate
 
The 5 Stages of Scale
xcbsmith
 
Ad

More from Olga Lavrentieva (20)

PPTX
15 10-22 altoros-fact_sheet_st_v4
Olga Lavrentieva
 
PPTX
Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance
Olga Lavrentieva
 
PPTX
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Olga Lavrentieva
 
PDF
Владимир Иванов (Oracle): Java: прошлое и будущее
Olga Lavrentieva
 
PPTX
Brug - Web push notification
Olga Lavrentieva
 
PDF
Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"
Olga Lavrentieva
 
PPTX
Максим Жилинский: "Контейнеры: под капотом"
Olga Lavrentieva
 
PPTX
Александр Протасеня: "PayPal. Различные способы интеграции"
Olga Lavrentieva
 
PPTX
Сергей Черничков: "Интеграция платежных систем в .Net приложения"
Olga Lavrentieva
 
PPTX
Антон Шемерей «Single responsibility principle в руби или почему instanceclas...
Olga Lavrentieva
 
PDF
Егор Воробьёв: «Ruby internals»
Olga Lavrentieva
 
PDF
Андрей Колешко «Что не так с Rails»
Olga Lavrentieva
 
PDF
Дмитрий Савицкий «Ruby Anti Magic Shield»
Olga Lavrentieva
 
PPTX
Сергей Алексеев «Парное программирование. Удаленно»
Olga Lavrentieva
 
PPTX
«Почему Spark отнюдь не так хорош»
Olga Lavrentieva
 
PPTX
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
Olga Lavrentieva
 
PPTX
«Практика построения высокодоступного решения на базе Cloud Foundry Paas»
Olga Lavrentieva
 
PPTX
«Дизайн продвинутых нереляционных схем для Big Data»
Olga Lavrentieva
 
PPTX
«Обзор возможностей Open cv»
Olga Lavrentieva
 
PPTX
«Нужно больше шин! Eventbus based framework vertx.io»
Olga Lavrentieva
 
15 10-22 altoros-fact_sheet_st_v4
Olga Lavrentieva
 
Сергей Ковалёв (Altoros): Practical Steps to Improve Apache Hive Performance
Olga Lavrentieva
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Olga Lavrentieva
 
Владимир Иванов (Oracle): Java: прошлое и будущее
Olga Lavrentieva
 
Brug - Web push notification
Olga Lavrentieva
 
Александр Ломов: "Reactjs + Haskell + Cloud Foundry = Love"
Olga Lavrentieva
 
Максим Жилинский: "Контейнеры: под капотом"
Olga Lavrentieva
 
Александр Протасеня: "PayPal. Различные способы интеграции"
Olga Lavrentieva
 
Сергей Черничков: "Интеграция платежных систем в .Net приложения"
Olga Lavrentieva
 
Антон Шемерей «Single responsibility principle в руби или почему instanceclas...
Olga Lavrentieva
 
Егор Воробьёв: «Ruby internals»
Olga Lavrentieva
 
Андрей Колешко «Что не так с Rails»
Olga Lavrentieva
 
Дмитрий Савицкий «Ruby Anti Magic Shield»
Olga Lavrentieva
 
Сергей Алексеев «Парное программирование. Удаленно»
Olga Lavrentieva
 
«Почему Spark отнюдь не так хорош»
Olga Lavrentieva
 
«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»
Olga Lavrentieva
 
«Практика построения высокодоступного решения на базе Cloud Foundry Paas»
Olga Lavrentieva
 
«Дизайн продвинутых нереляционных схем для Big Data»
Olga Lavrentieva
 
«Обзор возможностей Open cv»
Olga Lavrentieva
 
«Нужно больше шин! Eventbus based framework vertx.io»
Olga Lavrentieva
 
Ad

Recently uploaded (20)

PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 

MongoDB and Apache HBase: Benchmarking

  • 1. AND Products comparison
  • 2. Technical overview Programming language C++ Java Language Bindings & Clients C, C++, Erlang, Haskell, Java, JavaScript, .NE Java, Jython ,Groovy DSL , Scala, REST T (C# F#, PowerShell, etc), Perl, PHP, Python, Rub y, Scala. Protocols Mongo Wire Protocol Apache Avro, Thrift, REST First public release and current state Feb 2009 Last release 2.0.2 14th Jul 2010 Last release 0.92.0 23th January December 2011 2012
  • 3. Technical overview Querying Mongo Query Language Filter Language Atomicity Conditional + Consistency + + Isolation - + Durability + - Periodic-Update Secondary Index Indexing of embedded element, Filter Query Secondary Indexes compound key Dual-Write Secondary Index Summary Tables Map/Reduce Supports Sharding + + Replication + + Revision control - +
  • 4. MongoDB features Document-oriented Capped Collections Greed FS Indexing Map Reduce Query language JSON/BSON Eventually-consistence
  • 5. HBase features • Column oriented(after Google big table) • Bloom filters on per column basis • MapReduce • Secondary Indexes • HDFS based • Revision controll
  • 9. MongoDB use cases Git Hub : the social coding site, is using MongoDB for an internal reporting application. РосГос затраты: RosSpending is the first Russian public spending monitoring project.. Disney: common set of tools and APIs for all games within the Interactive Media Group, using MongoDB as a common object repository to persist state information. Over 300 of companies have prodact deployments of mongoDB
  • 10. HBase use cases Facebook : Real-Time messaging Over 152 billions messages monthly Adobe: 30 nodes social services ,data and processing for internal use. Explorys: over a billion anonymized clinical records Mozilla Socorro : Crash reporting system Powered by about 40 companies
  • 11. Benchmarking • Enveroment: Amazon Elastic compute cloud. • Testing tool – Yahoo Cloud Service benchmark(YCSB)
  • 12. 2000.00 4000.00 6000.00 8000.00 10000.00 12000.00 0.00 0 1200 2400 3600 4800 6000 7200 8400 9600 10800 12000 13200 14400 15600 16800 18000 19200 20400 12hours of loading. 21600 22800 24000 25200 167.600.000 for 4 shards 26400 27600 28800 30000 31200 32400 95.000.000 records for 2 shards 33600 34800 36000 37200 38400 MongoDB Benchmarking. 39600 40800 2 shards loading 4 shards loading 42000 43200 44400
  • 13. 50% reads 50% updates 12000 10000 8000 2 shards update 2 shards read 6000 4 shards update 4 shards read 4000 2000 0 500 1000 2000 3000 3300 5000
  • 14. 95% reads 5% updates 16000 14000 12000 10000 2 shards update 8000 2 shards read 4 shards update 6000 4 shards read 4000 2000 0 500 1000 2000 3000 4000 5000
  • 15. Read only performance 10000 9000 8000 7000 6000 5000 2 shards 4000 4 shards 3000 2000 1000 0 500 1000 2000 3000 4000 5000
  • 16. Read Insert performance 250000 200000 150000 2 shards insert 2 shards read 100000 4 shards insert 4 shards read 50000 0 200 300 400

Editor's Notes

  • #13: DataNodes are constantly reporting to the NameNode. Blocks are stored on the Data Nodes.