SlideShare a Scribd company logo
Methods of benchmarking NoSQL database systems

                              Ilya Bakulin
                webmaster@kibab.com, kibab@FreeBSD.org
                               SMS Traffic


                              LVEE 2011




Ilya Bakulin (SMS Traffic)     NoSQL benchmarking          July 2, 2011   1 / 16
1   Introduction


2   YCSB benchmarking framework


3   YCSB practical usage


4   Results


5   Where to find further information




    Ilya Bakulin (SMS Traffic)   NoSQL benchmarking   July 2, 2011   2 / 16
Why benchmarking NoSQL is nessesary




    No guides / FAQs about performance are generally available, or are
    outdated
    NoSQL systems are actively developed
    Nobody wants to end up with crashed DB in production right before
    2-week vacation




  Ilya Bakulin (SMS Traffic)   NoSQL benchmarking            July 2, 2011   3 / 16
Why benchmarking NoSQL is complex




    RDBMS use SQL to provide access to data stored in them, while
    NOSQL systems don’t
    Each NoSQL uses different protocol (Thrift, Memcached-style, own
    protocols)
    Existing benchmarks require SQL to work with database under
    inspection.




  Ilya Bakulin (SMS Traffic)   NoSQL benchmarking          July 2, 2011   4 / 16
What is YCSB?




    YCSB stands for Yahoo Cloud Serving Benchmark
    Developed by Yahoo! Research group
    Open Source project, hosted on GitHub (178 watchers, 42 forks)




  Ilya Bakulin (SMS Traffic)   NoSQL benchmarking           July 2, 2011   5 / 16
Architecture                  Benchmark tool
  •! Java application
      Java application
      –! Many systems have Java APIs
    Shipped with ready-to-use adapters for several popular Opensource
    databases systems via HTTP/REST, JNI or some other solution
      –! Other

                                        Command-line parameters
                                        •! DB to use
                                        •! Target throughput
                                        •! Number of threads
                                        •! …




             Workload
                                        YCSB client




                                                                                         Cloud DB
             parameter file




                                                             DB client
             •! R/W mix
             •! Record size                       Client
                                        Workload threads
             •! Data set
                                        executor
             •! …
                                                    Stats


     Extensible: define new workloads
                                              Extensible: plug in new clients
                                                                                                        5


  Ilya Bakulin (SMS Traffic)                 NoSQL benchmarking                   July 2, 2011        6 / 16
More on DB interface
                              Benchmark tool
     Simple operations: INSERT, UPDATE, REPLACE, DELETE, SCAN
  •! Does not use SQL
       Java application
     ... –! Many systems have Java APIs
          but SQL support is avaible through contributed JDBC driver
     ... –! Other systems configurations are possible other solution
          Even sharding via HTTP/REST, JNI or some

                                        Command-line parameters
                                        •! DB to use
                                        •! Target throughput
                                        •! Number of threads
                                        •! …




             Workload
                                        YCSB client




                                                                                         Cloud DB
             parameter file




                                                             DB client
             •! R/W mix
             •! Record size                       Client
                                        Workload threads
             •! Data set
                                        executor
             •! …
                                                    Stats


     Extensible: define new workloads
                                              Extensible: plug in new clients
                                                                                                        5

  Ilya Bakulin (SMS Traffic)                 NoSQL benchmarking                   July 2, 2011        7 / 16
More on DB interface
                              Benchmark tool
     Simple operations: INSERT, UPDATE, REPLACE, DELETE, SCAN
  •! Does not use SQL
       Java application
     ... –! Many systems have Java APIs
          but SQL support is avaible through contributed JDBC driver
     ... –! Other systems configurations are possible other solution
          Even sharding via HTTP/REST, JNI or some

                                        Command-line parameters
                                        •! DB to use
                                        •! Target throughput
                                        •! Number of threads
                                        •! …




             Workload
                                        YCSB client




                                                                                         Cloud DB
             parameter file




                                                             DB client
             •! R/W mix
             •! Record size                       Client
                                        Workload threads
             •! Data set
                                        executor
             •! …
                                                    Stats


     Extensible: define new workloads
                                              Extensible: plug in new clients
                                                                                                        5

  Ilya Bakulin (SMS Traffic)                 NoSQL benchmarking                   July 2, 2011        7 / 16
More on DB interface
                              Benchmark tool
     Simple operations: INSERT, UPDATE, REPLACE, DELETE, SCAN
  •! Does not use SQL
       Java application
     ... –! Many systems have Java APIs
          but SQL support is avaible through contributed JDBC driver
     ... –! Other systems configurations are possible other solution
          Even sharding via HTTP/REST, JNI or some

                                        Command-line parameters
                                        •! DB to use
                                        •! Target throughput
                                        •! Number of threads
                                        •! …




             Workload
                                        YCSB client




                                                                                         Cloud DB
             parameter file




                                                             DB client
             •! R/W mix
             •! Record size                       Client
                                        Workload threads
             •! Data set
                                        executor
             •! …
                                                    Stats


     Extensible: define new workloads
                                              Extensible: plug in new clients
                                                                                                        5

  Ilya Bakulin (SMS Traffic)                 NoSQL benchmarking                   July 2, 2011        7 / 16
More on workload

 d by replacing the value of one                                                  Uniform:




                                                       +(,-./%'&0
 ther one randomly chosen field
              Specifies what DB
                                                                    1)))2)))333
              operations are used by
 order, starting at a randomly                                                    !"#$%&'(")(%*$%
                                                                                                               4

  number of application
              records to scan is
                                                                                   Zipfian:
             Also defines request




                                                       +(,-./%'&0
 distribution of scan lengths is
             distribution
oad. Thus, the scan() method
 number of records to scan.to specify
              It is possible Of
 y instead specify a scan interval                                  1)))2)))333                                4
              record size
February 15th). The number of                                                     !"#$%&'(")(%*$%

              It’s possible to specify
  to control the size of these in-                                                  Latest:
 termine and number of records and
              specify meaningful
                                                       +(,-./%'&0
 of the database calls, including
 tion 5.2.1.) operations
                                                                    1)))2)))333                                4
                                                                                  !"#$%&'(")(%*$%


 t make many random choices
       Ilya Bakulin (SMS Traffic)          NoSQL benchmarking                                         July 2, 2011   8 / 16
/* TODO: Remove this crap */




  Ilya Bakulin (SMS Traffic)   NoSQL benchmarking   July 2, 2011   9 / 16
SMS Traffic: workload construction




SMS service provider, several gateways, big clients (such as banks)
    15% inserts, 65% updates, 15% reads
    Request distribution: latest SMS messages are the ”hottest” ones
    Evaluated Cassandra and sharded MySQL as DB storage for the next
    generation of SMS sending platform




  Ilya Bakulin (SMS Traffic)     NoSQL benchmarking            July 2, 2011   10 / 16
Testing process




    3 instances of DBMS system on one server (Core Quad Q9400, 4GB
    RAM, SATA-II HDD, FreeBSD 8.2-amd64)
    Cassandra 0.7.4 (1GB Java heap / instance)
    MySQL 5.1 + InnoDB engine (1GB InnoDB buffer pool size /
    instance)
    Client: separate machine, 1Gb/s connection
Should avoid swapping and disk IO saturation




  Ilya Bakulin (SMS Traffic)   NoSQL benchmarking        July 2, 2011   11 / 16
Some resuts: Workload ”A”: 50% read / 50% write
                                                                     Workload A – Upd
                               •! 50/50 Read/update
                                                                 Workload A - Read latency

                                                70                                                                     80

                                                60                                                                     70
                    Average read latency (ms)




                                                                                                        Update latency (ms)
                                                                                                                       60
                                                50
                                                                                                                       50
                                                40
                                                                                                                       40
                                                30
                                                                                                                       30
                                                20
                                                                                                                       20
                                                10                                                                     10
                                                 0                                                                            0
                                                     0               5000          10000        15000                             0
                                                                      Throughput (ops/sec)

                                                         Cassandra     Hbase      Sherpa     MySQL                                    Cassan



  Ilya Bakulin (SMS Traffic)     Comment: Cassandra is optimized for writes, and achieves h
                                           NoSQL benchmarking            July 2, 2011 12 / 16
oad A – Workload ”A”: 50% read / 50% write
 Some resuts:
              Update heavy
 y                                                     Workload A - Update latency

                                 80

                                 70
                  Update latency (ms)

                                 60

                                 50

                                 40

                                 30

                                 20

                                 10

                                        0
        15000                               0               5000          10000         15000
 )                                                           Throughput (ops/sec)

     MySQL                                      Cassandra      Hbase     Sherpa      MySQL



 zed for writes, and achieves higher throughput and lower
       Ilya Bakulin (SMS Traffic)   NoSQL benchmarking                                            July 2, 2011   13 / 16
Some resuts: Workload ”B”: 95% read / 5% write                       Workload B – Re
                                                •!       95/5 Read/update
                                                                 Workload B - Read latency

                                            20                                                                                               40
                                            18                                                                                               35




                                                                                                                     Average update latency (ms)
                    Average read latency (ms)

                                            16
                                                                                                                                             30
                                            14
                                            12                                                                                               25

                                            10                                                                                               20
                                                8                                                                                            15
                                                6
                                                                                                                                             10
                                                4
                                                                                                                                                   5
                                                2
                                                0                                                                                                  0
                                                     0       2000        4000      6000       8000   10000                                             0             2
                                                                     Throughput (operations/sec)

                                                         Cassandra       HBase       Sherpa     MySQL                                                        Cassan




  Ilya Bakulin (SMS Traffic)                                            NoSQL benchmarking                     July 2, 2011                                  14 / 16
oad resuts: Workload ”B”:heavy 5% write
 Some B – Read 95% read /

                                                                        Workload B - Update latency

                                                   40
                                                   35
                           Average update latency (ms)


                                                   30
                                                   25
                                                   20
                                                   15
                                                   10
                                                         5
                                                         0
 8000      10000                                             0       2000        4000      6000       8000   10000
ec)                                                                          Throughput (operations/sec)

    MySQL                                                        Cassandra        Hbase      Sherpa     MySQL




        Ilya Bakulin (SMS Traffic)                                              NoSQL benchmarking                     July 2, 2011   15 / 16
Links




    Yahoo Cloud Serving Benchmark:
    https://github.com/brianfrankcooper/YCSB
    google://
    webmaster@kibab.com, kibab@FreeBSD.org




  Ilya Bakulin (SMS Traffic)   NoSQL benchmarking   July 2, 2011   16 / 16

More Related Content

PDF
Couchbase Performance Benchmarking
Renat Khasanshyn
 
PDF
Evaluating NoSQL Performance: Time for Benchmarking
Sergey Bushik
 
ODP
Benchmarking MongoDB and CouchBase
Christopher Choi
 
PDF
Cosbench apac
OpenCity Community
 
PDF
cosbench-openstack.pdf
OpenStack Foundation
 
PDF
Database backed coherence cache
aragozin
 
PDF
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
Hortonworks
 
PDF
Gluster Webinar: Introduction to GlusterFS
GlusterFS
 
Couchbase Performance Benchmarking
Renat Khasanshyn
 
Evaluating NoSQL Performance: Time for Benchmarking
Sergey Bushik
 
Benchmarking MongoDB and CouchBase
Christopher Choi
 
Cosbench apac
OpenCity Community
 
cosbench-openstack.pdf
OpenStack Foundation
 
Database backed coherence cache
aragozin
 
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
Hortonworks
 
Gluster Webinar: Introduction to GlusterFS
GlusterFS
 

What's hot (20)

PDF
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Odinot Stanislas
 
PDF
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
GlusterFS
 
PDF
Distributed Caching Essential Lessons (Ts 1402)
Yury Kaliaha
 
PPT
Alfresco Large Scale Enterprise Deployments
Alfresco Software
 
PPTX
Hadoop on Virtual Machines
Richard McDougall
 
ODP
Hug Hbase Presentation.
Jack Levin
 
PDF
Intro to GlusterFS Webinar - August 2011
GlusterFS
 
PDF
Scaling Out Tier Based Applications
Yury Kaliaha
 
PDF
Apache Hadoop on Virtual Machines
DataWorks Summit
 
PDF
Cloud Storage Adoption, Practice, and Deployment
GlusterFS
 
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
PDF
Codemotion 2015 Infinispan Tech lab
Ugo Landini
 
PDF
Postgres: The NoSQL Cake You Can Eat
EDB
 
PDF
How to Increase Performance of Your Hadoop Cluster
Altoros
 
PDF
Hbase: an introduction
Jean-Baptiste Poullet
 
PPTX
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Cloudera, Inc.
 
PPTX
Award winning scale-up and scale-out storage for Xen
GlusterFS
 
PPTX
001 hbase introduction
Scott Miao
 
PDF
004 architecture andadvanceduse
Scott Miao
 
PPTX
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Odinot Stanislas
 
Webinar Sept 22: Gluster Partners with Redapt to Deliver Scale-Out NAS Storage
GlusterFS
 
Distributed Caching Essential Lessons (Ts 1402)
Yury Kaliaha
 
Alfresco Large Scale Enterprise Deployments
Alfresco Software
 
Hadoop on Virtual Machines
Richard McDougall
 
Hug Hbase Presentation.
Jack Levin
 
Intro to GlusterFS Webinar - August 2011
GlusterFS
 
Scaling Out Tier Based Applications
Yury Kaliaha
 
Apache Hadoop on Virtual Machines
DataWorks Summit
 
Cloud Storage Adoption, Practice, and Deployment
GlusterFS
 
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Codemotion 2015 Infinispan Tech lab
Ugo Landini
 
Postgres: The NoSQL Cake You Can Eat
EDB
 
How to Increase Performance of Your Hadoop Cluster
Altoros
 
Hbase: an introduction
Jean-Baptiste Poullet
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Cloudera, Inc.
 
Award winning scale-up and scale-out storage for Xen
GlusterFS
 
001 hbase introduction
Scott Miao
 
004 architecture andadvanceduse
Scott Miao
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
Ad

Viewers also liked (17)

PPTX
Introduction to Couchbase Server 2.0
Dipti Borkar
 
PDF
Introduction to Database Benchmarking with Benchmark Factory
Michael Micalizzi
 
PDF
Database Hardware Benchmarking
Command Prompt., Inc
 
PPTX
Presentation: mongo db & elasticsearch & membase
Ardak Shalkarbayuli
 
KEY
MongoDB In Production At Sailthru
ibwhite
 
PDF
Cignex mongodb-sharding-mongodbdays
MongoDB APAC
 
PDF
Rpsonmongodb
MongoDB APAC
 
PDF
Pelicamigrator
MongoDB APAC
 
PPTX
What's new in MongoDB 2.6
Matias Cascallares
 
PDF
247 overviewmongodbevening-bangalore
MongoDB APAC
 
PPTX
An afternoon with mongo db new delhi
Rajnish Verma
 
PPTX
MMS - Monitoring, backup and management at a single click
Matias Cascallares
 
PDF
Mongo db eveningschemadesign
MongoDB APAC
 
PDF
Big Data Benchmarking Tutorial
Tilmann Rabl
 
PPTX
Lightning talk: elasticsearch at Cogenta
Yann Cluchey
 
PPTX
An Introduction to MongoDB Ops Manager
MongoDB
 
PPTX
Internet of things for Smart Home
Khwaja Aamer
 
Introduction to Couchbase Server 2.0
Dipti Borkar
 
Introduction to Database Benchmarking with Benchmark Factory
Michael Micalizzi
 
Database Hardware Benchmarking
Command Prompt., Inc
 
Presentation: mongo db & elasticsearch & membase
Ardak Shalkarbayuli
 
MongoDB In Production At Sailthru
ibwhite
 
Cignex mongodb-sharding-mongodbdays
MongoDB APAC
 
Rpsonmongodb
MongoDB APAC
 
Pelicamigrator
MongoDB APAC
 
What's new in MongoDB 2.6
Matias Cascallares
 
247 overviewmongodbevening-bangalore
MongoDB APAC
 
An afternoon with mongo db new delhi
Rajnish Verma
 
MMS - Monitoring, backup and management at a single click
Matias Cascallares
 
Mongo db eveningschemadesign
MongoDB APAC
 
Big Data Benchmarking Tutorial
Tilmann Rabl
 
Lightning talk: elasticsearch at Cogenta
Yann Cluchey
 
An Introduction to MongoDB Ops Manager
MongoDB
 
Internet of things for Smart Home
Khwaja Aamer
 
Ad

Similar to Methods of NoSQL database systems benchmarking (20)

PDF
SQL Server 2008 Fast Track Data Warehouse
Mark Ginnebaugh
 
KEY
NOSQL, CouchDB, and the Cloud
boorad
 
PDF
Yahoo Cloud Serving Benchmark
kevin han
 
KEY
Processing Big Data
cwensel
 
PDF
Introduction to Hadoop
Ovidiu Dimulescu
 
PDF
Top 6 Reasons to Use a Distributed Data Grid
ScaleOut Software
 
KEY
DevNation Atlanta
boorad
 
PDF
2011 04-dsi-javaee-in-the-cloud-andreadis
dandre
 
PPTX
MEW22 22nd Machine Evaluation Workshop Microsoft
Lee Stott
 
PDF
Windows Azure Platform Technical Deep Dive - Chris Auld (Intergen)
Spiffy
 
PDF
Building Distributed Systems With Riak and Riak Core
Andy Gross
 
PDF
MySQL高可用
thinkinlamp
 
PPTX
Clustrix Database Percona Ruby on Rails benchmark
Clustrix
 
PDF
Windows Sql Azure Cloud Computing Platform
Eduardo Castro
 
PPTX
Apache Drill
Ted Dunning
 
PPT
Large-scale projects development (scaling LAMP)
Alexey Rybak
 
PPTX
Microsoft Openness Mongo DB
Heriyadi Janwar
 
PPTX
SQL Explore 2012 - Meir Dudai: DAC
sqlserver.co.il
 
PPTX
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
Mark Kromer
 
PDF
SSD Performance Benchmarking
Shirish Jamthe
 
SQL Server 2008 Fast Track Data Warehouse
Mark Ginnebaugh
 
NOSQL, CouchDB, and the Cloud
boorad
 
Yahoo Cloud Serving Benchmark
kevin han
 
Processing Big Data
cwensel
 
Introduction to Hadoop
Ovidiu Dimulescu
 
Top 6 Reasons to Use a Distributed Data Grid
ScaleOut Software
 
DevNation Atlanta
boorad
 
2011 04-dsi-javaee-in-the-cloud-andreadis
dandre
 
MEW22 22nd Machine Evaluation Workshop Microsoft
Lee Stott
 
Windows Azure Platform Technical Deep Dive - Chris Auld (Intergen)
Spiffy
 
Building Distributed Systems With Riak and Riak Core
Andy Gross
 
MySQL高可用
thinkinlamp
 
Clustrix Database Percona Ruby on Rails benchmark
Clustrix
 
Windows Sql Azure Cloud Computing Platform
Eduardo Castro
 
Apache Drill
Ted Dunning
 
Large-scale projects development (scaling LAMP)
Alexey Rybak
 
Microsoft Openness Mongo DB
Heriyadi Janwar
 
SQL Explore 2012 - Meir Dudai: DAC
sqlserver.co.il
 
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
Mark Kromer
 
SSD Performance Benchmarking
Shirish Jamthe
 

More from Транслируем.бел (20)

PDF
Медицинские трансляции
Транслируем.бел
 
PDF
Руководство по видео, трансляциям и премьерам (Youtube 2020)
Транслируем.бел
 
PDF
Корпоративный новый год онлайн
Транслируем.бел
 
PDF
Unofficial guide to vmix by streamgeeks
Транслируем.бел
 
PDF
Руководство для малого и среднего бизнеса по использованию цифровых решений
Транслируем.бел
 
PDF
Sennheiser ew100 g2
Транслируем.бел
 
PPT
Сравнение поколений Y и Z
Транслируем.бел
 
PPTX
Онлайн-трансляции в соцсетях
Транслируем.бел
 
PDF
Как организовать трансляцию в Facebook
Транслируем.бел
 
PDF
The ultimate guide to facebook live for your event
Транслируем.бел
 
PDF
Guide to facebook live
Транслируем.бел
 
PPTX
Что сделать, чтобы сто раз все не переделывать
Транслируем.бел
 
PDF
Когда сказать нет. Арсений Кравченко
Транслируем.бел
 
PDF
Marketing Essentials for Startup Teams
Транслируем.бел
 
PDF
SMM учебник. Как продвигать банк в социальных сетях. Наглядное пособие
Транслируем.бел
 
PPTX
методы монетизации интернет проектов
Транслируем.бел
 
PDF
Belarus internet users discovery
Транслируем.бел
 
Медицинские трансляции
Транслируем.бел
 
Руководство по видео, трансляциям и премьерам (Youtube 2020)
Транслируем.бел
 
Корпоративный новый год онлайн
Транслируем.бел
 
Unofficial guide to vmix by streamgeeks
Транслируем.бел
 
Руководство для малого и среднего бизнеса по использованию цифровых решений
Транслируем.бел
 
Sennheiser ew100 g2
Транслируем.бел
 
Сравнение поколений Y и Z
Транслируем.бел
 
Онлайн-трансляции в соцсетях
Транслируем.бел
 
Как организовать трансляцию в Facebook
Транслируем.бел
 
The ultimate guide to facebook live for your event
Транслируем.бел
 
Guide to facebook live
Транслируем.бел
 
Что сделать, чтобы сто раз все не переделывать
Транслируем.бел
 
Когда сказать нет. Арсений Кравченко
Транслируем.бел
 
Marketing Essentials for Startup Teams
Транслируем.бел
 
SMM учебник. Как продвигать банк в социальных сетях. Наглядное пособие
Транслируем.бел
 
методы монетизации интернет проектов
Транслируем.бел
 
Belarus internet users discovery
Транслируем.бел
 

Recently uploaded (20)

PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Doc9.....................................
SofiaCollazos
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
The Future of Artificial Intelligence (AI)
Mukul
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Software Development Methodologies in 2025
KodekX
 
Doc9.....................................
SofiaCollazos
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 

Methods of NoSQL database systems benchmarking

  • 1. Methods of benchmarking NoSQL database systems Ilya Bakulin [email protected], [email protected] SMS Traffic LVEE 2011 Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 1 / 16
  • 2. 1 Introduction 2 YCSB benchmarking framework 3 YCSB practical usage 4 Results 5 Where to find further information Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 2 / 16
  • 3. Why benchmarking NoSQL is nessesary No guides / FAQs about performance are generally available, or are outdated NoSQL systems are actively developed Nobody wants to end up with crashed DB in production right before 2-week vacation Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 3 / 16
  • 4. Why benchmarking NoSQL is complex RDBMS use SQL to provide access to data stored in them, while NOSQL systems don’t Each NoSQL uses different protocol (Thrift, Memcached-style, own protocols) Existing benchmarks require SQL to work with database under inspection. Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 4 / 16
  • 5. What is YCSB? YCSB stands for Yahoo Cloud Serving Benchmark Developed by Yahoo! Research group Open Source project, hosted on GitHub (178 watchers, 42 forks) Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 5 / 16
  • 6. Architecture Benchmark tool •! Java application Java application –! Many systems have Java APIs Shipped with ready-to-use adapters for several popular Opensource databases systems via HTTP/REST, JNI or some other solution –! Other Command-line parameters •! DB to use •! Target throughput •! Number of threads •! … Workload YCSB client Cloud DB parameter file DB client •! R/W mix •! Record size Client Workload threads •! Data set executor •! … Stats Extensible: define new workloads Extensible: plug in new clients 5 Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 6 / 16
  • 7. More on DB interface Benchmark tool Simple operations: INSERT, UPDATE, REPLACE, DELETE, SCAN •! Does not use SQL Java application ... –! Many systems have Java APIs but SQL support is avaible through contributed JDBC driver ... –! Other systems configurations are possible other solution Even sharding via HTTP/REST, JNI or some Command-line parameters •! DB to use •! Target throughput •! Number of threads •! … Workload YCSB client Cloud DB parameter file DB client •! R/W mix •! Record size Client Workload threads •! Data set executor •! … Stats Extensible: define new workloads Extensible: plug in new clients 5 Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 7 / 16
  • 8. More on DB interface Benchmark tool Simple operations: INSERT, UPDATE, REPLACE, DELETE, SCAN •! Does not use SQL Java application ... –! Many systems have Java APIs but SQL support is avaible through contributed JDBC driver ... –! Other systems configurations are possible other solution Even sharding via HTTP/REST, JNI or some Command-line parameters •! DB to use •! Target throughput •! Number of threads •! … Workload YCSB client Cloud DB parameter file DB client •! R/W mix •! Record size Client Workload threads •! Data set executor •! … Stats Extensible: define new workloads Extensible: plug in new clients 5 Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 7 / 16
  • 9. More on DB interface Benchmark tool Simple operations: INSERT, UPDATE, REPLACE, DELETE, SCAN •! Does not use SQL Java application ... –! Many systems have Java APIs but SQL support is avaible through contributed JDBC driver ... –! Other systems configurations are possible other solution Even sharding via HTTP/REST, JNI or some Command-line parameters •! DB to use •! Target throughput •! Number of threads •! … Workload YCSB client Cloud DB parameter file DB client •! R/W mix •! Record size Client Workload threads •! Data set executor •! … Stats Extensible: define new workloads Extensible: plug in new clients 5 Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 7 / 16
  • 10. More on workload d by replacing the value of one Uniform: +(,-./%'&0 ther one randomly chosen field Specifies what DB 1)))2)))333 operations are used by order, starting at a randomly !"#$%&'(")(%*$% 4 number of application records to scan is Zipfian: Also defines request +(,-./%'&0 distribution of scan lengths is distribution oad. Thus, the scan() method number of records to scan.to specify It is possible Of y instead specify a scan interval 1)))2)))333 4 record size February 15th). The number of !"#$%&'(")(%*$% It’s possible to specify to control the size of these in- Latest: termine and number of records and specify meaningful +(,-./%'&0 of the database calls, including tion 5.2.1.) operations 1)))2)))333 4 !"#$%&'(")(%*$% t make many random choices Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 8 / 16
  • 11. /* TODO: Remove this crap */ Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 9 / 16
  • 12. SMS Traffic: workload construction SMS service provider, several gateways, big clients (such as banks) 15% inserts, 65% updates, 15% reads Request distribution: latest SMS messages are the ”hottest” ones Evaluated Cassandra and sharded MySQL as DB storage for the next generation of SMS sending platform Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 10 / 16
  • 13. Testing process 3 instances of DBMS system on one server (Core Quad Q9400, 4GB RAM, SATA-II HDD, FreeBSD 8.2-amd64) Cassandra 0.7.4 (1GB Java heap / instance) MySQL 5.1 + InnoDB engine (1GB InnoDB buffer pool size / instance) Client: separate machine, 1Gb/s connection Should avoid swapping and disk IO saturation Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 11 / 16
  • 14. Some resuts: Workload ”A”: 50% read / 50% write Workload A – Upd •! 50/50 Read/update Workload A - Read latency 70 80 60 70 Average read latency (ms) Update latency (ms) 60 50 50 40 40 30 30 20 20 10 10 0 0 0 5000 10000 15000 0 Throughput (ops/sec) Cassandra Hbase Sherpa MySQL Cassan Ilya Bakulin (SMS Traffic) Comment: Cassandra is optimized for writes, and achieves h NoSQL benchmarking July 2, 2011 12 / 16
  • 15. oad A – Workload ”A”: 50% read / 50% write Some resuts: Update heavy y Workload A - Update latency 80 70 Update latency (ms) 60 50 40 30 20 10 0 15000 0 5000 10000 15000 ) Throughput (ops/sec) MySQL Cassandra Hbase Sherpa MySQL zed for writes, and achieves higher throughput and lower Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 13 / 16
  • 16. Some resuts: Workload ”B”: 95% read / 5% write Workload B – Re •! 95/5 Read/update Workload B - Read latency 20 40 18 35 Average update latency (ms) Average read latency (ms) 16 30 14 12 25 10 20 8 15 6 10 4 5 2 0 0 0 2000 4000 6000 8000 10000 0 2 Throughput (operations/sec) Cassandra HBase Sherpa MySQL Cassan Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 14 / 16
  • 17. oad resuts: Workload ”B”:heavy 5% write Some B – Read 95% read / Workload B - Update latency 40 35 Average update latency (ms) 30 25 20 15 10 5 0 8000 10000 0 2000 4000 6000 8000 10000 ec) Throughput (operations/sec) MySQL Cassandra Hbase Sherpa MySQL Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 15 / 16
  • 18. Links Yahoo Cloud Serving Benchmark: https://github.com/brianfrankcooper/YCSB google:// [email protected], [email protected] Ilya Bakulin (SMS Traffic) NoSQL benchmarking July 2, 2011 16 / 16