SlideShare a Scribd company logo
MongoDB Performance
      Tuning
       MongoSV 2012
       Kenny Gorman
    Founder, ObjectRocket
       @objectrocket @kennygorman
MongoDB performance tuning
Obsession
● Performance planning
● Order matters:
  1. Schema design
  2. Statement tuning
  3. Instance tuning

●   Single server performance
●   Not a single thing you do, it's an obsession
●   Rinse and repeat
●   Understand your database workload
Statement Tuning
●   Profiler
     ○ Tuning tool/process to capture statements against db into a collection
     ○ Use regular queries to mine and prioritize tuning opportunities
     ○ Sometimes you can understand what to tune from this output alone,
         sometimes you need to explain it.

●   Explain
     ○ Take statement from profiler, explain it
     ○ Gives detailed execution data on the query or statement
     ○ Interpret output, make changes
     ○ Rinse/Repeat
The MongoDB Profiler
●   Data is saved in capped collections, 1 per shard
     ○ db.system.profile
●   Turn it on, gather data, later analyze for tuning opportunities
     ○ db.setProfilingLevel(1,20)
     ○ db.getProfilingStatus()
     ○ 1 document per statement
     ○ show profile
     ○ db.system.profile.find()
     ○ leave it on, don't be scared.
●   Use new Aggregation Framework
     ○ Allows for aggregated queries from loads of data
     ○ Examples: https://gist.github.com/995a3aa5b35e92e5ab57
Example
// simple profiler queries
// slowest
> db.system.profile.find({"millis":{$gt:20}})

// in order they happened, last 20
> db.system.profile.find().sort({$natural:-1}).limit(20)

// only queries
> db.system.profile.find().sort({"op":"query"})




● problem: lots of data!
Example
// use aggregation to differentiate ops
> db.system.profile.aggregate({ $group : { _id :"$op",
      count:{$sum:1},
      "max response time":{$max:"$millis"},
      "avg response time":{$avg:"$millis"}
}});
{
      "result" : [
             { "_id" : "command", "count" : 1, "max response time" : 0, "avg response time" : 0 },
             { "_id" : "query", "count" : 12, "max response time" : 571, "avg response time" : 5 },
             { "_id" : "update", "count" : 842, "max response time" : 111, "avg response time" : 40 },
             { "_id" : "insert", "count" : 1633, "max response time" : 2, "avg response time" : 1 }
      ],
      "ok" : 1
}




●    contrast how many of an item vs response time
●    contrast average response time vs max
●    prioritize op type
Example
// use aggregation to differentiate collections
>db.system.profile.aggregate(
   {$group : { _id :"$ns", count:{$sum:1}, "max response time":{$max:"$millis"},
                  "avg response time":{$avg:"$millis"} }},
    {$sort: { "max response time":-1}}
 );
{
    "result" : [
       { "_id" : "game.players","count" : 787, "max response time" : 111, "avg response time" : 0},
       {"_id" : "game.games","count" : 1681,"max response time" : 71, "avg response time" : 60},
       {"_id" : "game.events","count" : 841,"max response time" : 1,"avg response time" : 0},
       ....
],
        "ok" : 1
}




●   keep this data over time!
●   contrast how many of an item vs response time
●   contrast average response time vs max
●   more examples: https://gist.github.
    com/995a3aa5b35e92e5ab57
Profiler Attributes
●   fastMod
     ○ Good! Fastest possible update. In-place atomic operator ($inc,$set)
●   nretunred vs nscanned
     ○ If nscanned != nscannedObjects, you may have opportunity to tune.
     ○ Add index
●   key updates
     ○ Secondary indexes. Minimize them
     ○ 10% reduction in performance for each secondary index
●   moved
     ○ Documents grow > padding factor
     ○ You can't fix it other than to pad yourself manually
     ○ Has to update indexes too!
     ○ db.collection.stats() shows padding
     ○ https://jira.mongodb.org/browse/SERVER-1810 <-- vote for me!
     ○ ^---- 2.3.1+ usePowerOf2Sizes
Example
{
    "ts" : ISODate("2012-09-14T16:34:00.010Z"),       // date it occurred
    "op" : "query",                                     // the operation type
    "ns" : "game.players",                              // the db and collection
    "query" : { "total_games" : 1000 },                 // query document
    "ntoreturn" : 0,                                    // # docs returned
    "ntoskip" : 0,
    "nscanned" : 959967,                                // number of docs scanned
    "keyUpdates" : 0,
    "numYield" : 1,
    "lockStats" : { ... },
    "nreturned" : 0,                                    // # docs actually returned
    "responseLength" : 20,                      // size of doc
    "millis" : 859,                                     // how long it took
    "client" : "127.0.0.1",                             // client asked for it
    "user" : ""                                         // the user asking for it
}
Example
{   "ts" : ISODate("2012-09-12T18:13:25.508Z"),
      "op" : "update",                                           // this is an update
      "ns" : "game.players",
      "query" : {"_id" : { "$in" : [ 37013, 13355 ] } },         // the query for the update
      "updateobj" : { "$inc" : { "games_started" : 1 }},         // the update being performed
      "nscanned" : 1,
      "moved" : true,                                            // document is moved
      "nmoved" : 1,
      "nupdated" : 1,
      "keyUpdates" : 0,                                  // at least no secondary indexes
      "numYield" : 0,
      "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(0),"w" : NumberLong(206)},
             "timeAcquiringMicros" : {"r" : NumberLong(0),"w" : NumberLong(163)}},
      "millis" : 0,
      "client" : "127.0.0.1",
      "user" : ""
}
Example
{
    "ts" : ISODate("2012-09-12T18:13:26.562Z"),
    "op" : "update",
    "ns" : "game.players",
    "query" : {"_id" : { "$in" : [ 27258, 4904 ] } },
    "updateobj" : { "$inc" : { "games_started" : 1}},
    "nscanned" : 40002,                                     // opportunity
    "moved" : true,                                         // opportunity
    "nmoved" : 1,
    "nupdated" : 1,
    "keyUpdates" : 2,                                 // opportunity
    "numYield" : 0,
    ....
Statement Tuning
●   Take any query when you build your app, explain it before you commit!
●   Take profiler data, use explain() to tune queries.
     ○ Use prioritized list you built from profiler
     ○ Copy/paste into explain()
●   Runs query when you call it, reports the plan it used to fulfill the statement
     ○ use limit(x) if it's really huge
●   Attributes of interest:
     ○ nscanned vs nscannedObjects
     ○ nYields
     ○ covered indexes; what is this?
     ○ data locality ( + covered indexes FTFW )
●   Sharding has extra data in explain() output
     ○ Shards attribute
          ■ How many Shards did you visit?
          ■ Look at each shard, they can differ! Some get hot.
          ■ Pick good keys or you will pay
Example
> db.games.find({ "players" : 32071 }).explain()
{
      "cursor" : "BtreeCursor players_1",
      "isMultiKey" : true,                                   // multikey type indexed array
      "n" : 1,                                               // 1 doc
      "nscannedObjects" : 1,
      "nscanned" : 1,                                        // visited index
      "nscannedObjectsAllPlans" : 1,
      "nscannedAllPlans" : 1,
      "scanAndOrder" : false,
      "indexOnly" : false,
      "nYields" : 0,                                         // didn't have to yield
      "nChunkSkips" : 0,
      "millis" : 2,                                          // fast
      "indexBounds" : {"players" : [ [ 32071, 32071 ] ] },   // good, used index
}
Example
// index only query
>db.events.find({ "user_id":35891},{"_id":0,"user_id":1}).explain()
{
       "cursor" : "BtreeCursor user_id_1",
       "isMultiKey" : false,
       "n" : 2,                                                   // number of docs
       "nscannedObjects" : 2,
       "nscanned" : 2,
       "nscannedObjectsAllPlans" : 2,
       "nscannedAllPlans" : 2,
       "scanAndOrder" : false,                                    // if sorting, can index be used?
       "indexOnly" : true,                                        // Index only query
       "nYields" : 0,
       "nChunkSkips" : 0,
       "millis" : 0,
       "indexBounds" : { "user_id" : [ [ 35891, 35891 ] ] },
}
bad!

Data locality
query: db.mytest.find({"user_id":10}).count() = 3

                                                    good!
            document; user_id:10




                 data block




 ●   No index organized collections... so...
 ●   Control process that inserts the data (queue/etc)
 ●   Perform reorgs (.sort()) on slaves then promote
 ●   Schema design
 ●   Bad data locality plus a cache miss are asking for trouble
 ●   Update+Move reduce good data locality (very likely)
 ●   Indexes naturally have good data locality!
Example;
Data Locality
> var arr=db.events.find(
     {"user_id":35891},
     {'$diskLoc':1, 'user_id':1}).limit(20).showDiskLoc()
> for(var i=0; i<arr.length(); i++) {
     var b=Math.round(arr[i].$diskLoc.offset/512);
     printjson(arr[i].user_id+" "+b);
     }

"35891 354"
"35891 55674"                    // what is this stuff?



examples at:
https://gist.github.com/977336
Instance tuning;
Write performance
●   Overall system performance function of write performance
●   Partition systems, functional split first. Group by common workloads.
●   Writes
     ○ Tune your writes!
          ■ fastMods where we can
          ■ Turn updates into inserts?
          ■ Secondary indexes checked?
     ○ Single writer lock in mongodb
          ■ Modified in 2.0+ for yield on fault
          ■ Modified in 2.2+ for lock scope per DB
          ■ All databases mutex; get over it.
          ■ Minimize time that writes take; you win
     ○ Lock %, write queues
     ○ Use bench.py to test your write performance (https://github.
         com/memsql/bench)
     ○ Write tuned I/O; Caches, SSD, etc
     ○ Sharding? Split then Shard
          ■ Balancer induces I/O and writes!
Instance tuning;
Read performance
●   Overall system performance function of write performance
●   Reads scale well as long as writes are tuned
●   Partition systems, split first. Group by common workloads.
●   Reads scale nicely, especially against slaves
     ○ inconsistency OK?
     ○ Know your workload!
●   Statements tuned
     ○ Using indexes
     ○ Covered indexes
     ○ Data locality
●   Sharding
     ○ See how I mentioned that last?
Contact
@kennygorman
@objectrocket
kgorman@objectrocket.com

https://www.objectrocket.com
https://github.com/kgorman/rocketstat

More Related Content

What's hot (20)

PPTX
An Enterprise Architect's View of MongoDB
MongoDB
 
PPTX
Achieving High Availability in PostgreSQL
Mydbops
 
PDF
Introduction to MongoDB
Mike Dirolf
 
PDF
MongoDB Database Replication
Mehdi Valikhani
 
PDF
PostgreSQL WAL for DBAs
PGConf APAC
 
PDF
MySQL 상태 메시지 분석 및 활용
I Goo Lee
 
PDF
ProxySQL - High Performance and HA Proxy for MySQL
René Cannaò
 
PPTX
MongoDB
nikhil2807
 
PPTX
MongoDB 101
Abhijeet Vaikar
 
PPTX
Introduction to MongoDB
MongoDB
 
PDF
Apache Spark Introduction
sudhakara st
 
PDF
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
PPTX
The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
PDF
Postgresql database administration volume 1
Federico Campoli
 
PPTX
Apache Spark Architecture
Alexey Grishchenko
 
PDF
Improving Apache Spark Downscaling
Databricks
 
PDF
MyRocks Deep Dive
Yoshinori Matsunobu
 
PDF
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
Databricks
 
PDF
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
PDF
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
An Enterprise Architect's View of MongoDB
MongoDB
 
Achieving High Availability in PostgreSQL
Mydbops
 
Introduction to MongoDB
Mike Dirolf
 
MongoDB Database Replication
Mehdi Valikhani
 
PostgreSQL WAL for DBAs
PGConf APAC
 
MySQL 상태 메시지 분석 및 활용
I Goo Lee
 
ProxySQL - High Performance and HA Proxy for MySQL
René Cannaò
 
MongoDB
nikhil2807
 
MongoDB 101
Abhijeet Vaikar
 
Introduction to MongoDB
MongoDB
 
Apache Spark Introduction
sudhakara st
 
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
The Right (and Wrong) Use Cases for MongoDB
MongoDB
 
Postgresql database administration volume 1
Federico Campoli
 
Apache Spark Architecture
Alexey Grishchenko
 
Improving Apache Spark Downscaling
Databricks
 
MyRocks Deep Dive
Yoshinori Matsunobu
 
The Top Five Mistakes Made When Writing Streaming Applications with Mark Grov...
Databricks
 
MongoDB WiredTiger Internals: Journey To Transactions
Mydbops
 
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 

Similar to MongoDB Performance Tuning (20)

KEY
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
Daniel Cousineau
 
PDF
10 Key MongoDB Performance Indicators
iammutex
 
PPTX
Operational Intelligence with MongoDB Webinar
MongoDB
 
PDF
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
MongoDB
 
PPTX
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
PDF
MongoDB With Style
Gabriele Lana
 
PDF
MongoDB: Optimising for Performance, Scale & Analytics
Server Density
 
PDF
MySQL flexible schema and JSON for Internet of Things
Alexander Rubin
 
PDF
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
MongoDB
 
PPTX
MongoDB Live Hacking
Tobias Trelle
 
PDF
d3sparql.js demo at SWAT4LS 2014 in Berlin
Toshiaki Katayama
 
PPTX
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
PDF
Maintenance for MongoDB Replica Sets
Igor Donchovski
 
PDF
Elasticsearch first-steps
Matteo Moci
 
PDF
Building Apps with MongoDB
Nate Abele
 
PDF
MongoDB and RDBMS
francescapasha
 
PPTX
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
PDF
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
KEY
Mongo db presentation
Julie Sommerville
 
PDF
2012 mongo db_bangalore_roadmap_new
MongoDB
 
NOSQL101, Or: How I Learned To Stop Worrying And Love The Mongo!
Daniel Cousineau
 
10 Key MongoDB Performance Indicators
iammutex
 
Operational Intelligence with MongoDB Webinar
MongoDB
 
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
MongoDB
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
MongoDB With Style
Gabriele Lana
 
MongoDB: Optimising for Performance, Scale & Analytics
Server Density
 
MySQL flexible schema and JSON for Internet of Things
Alexander Rubin
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
MongoDB
 
MongoDB Live Hacking
Tobias Trelle
 
d3sparql.js demo at SWAT4LS 2014 in Berlin
Toshiaki Katayama
 
Building a Scalable Inbox System with MongoDB and Java
antoinegirbal
 
Maintenance for MongoDB Replica Sets
Igor Donchovski
 
Elasticsearch first-steps
Matteo Moci
 
Building Apps with MongoDB
Nate Abele
 
MongoDB and RDBMS
francescapasha
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
Mongo db presentation
Julie Sommerville
 
2012 mongo db_bangalore_roadmap_new
MongoDB
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 

MongoDB Performance Tuning

  • 1. MongoDB Performance Tuning MongoSV 2012 Kenny Gorman Founder, ObjectRocket @objectrocket @kennygorman
  • 2. MongoDB performance tuning Obsession ● Performance planning ● Order matters: 1. Schema design 2. Statement tuning 3. Instance tuning ● Single server performance ● Not a single thing you do, it's an obsession ● Rinse and repeat ● Understand your database workload
  • 3. Statement Tuning ● Profiler ○ Tuning tool/process to capture statements against db into a collection ○ Use regular queries to mine and prioritize tuning opportunities ○ Sometimes you can understand what to tune from this output alone, sometimes you need to explain it. ● Explain ○ Take statement from profiler, explain it ○ Gives detailed execution data on the query or statement ○ Interpret output, make changes ○ Rinse/Repeat
  • 4. The MongoDB Profiler ● Data is saved in capped collections, 1 per shard ○ db.system.profile ● Turn it on, gather data, later analyze for tuning opportunities ○ db.setProfilingLevel(1,20) ○ db.getProfilingStatus() ○ 1 document per statement ○ show profile ○ db.system.profile.find() ○ leave it on, don't be scared. ● Use new Aggregation Framework ○ Allows for aggregated queries from loads of data ○ Examples: https://gist.github.com/995a3aa5b35e92e5ab57
  • 5. Example // simple profiler queries // slowest > db.system.profile.find({"millis":{$gt:20}}) // in order they happened, last 20 > db.system.profile.find().sort({$natural:-1}).limit(20) // only queries > db.system.profile.find().sort({"op":"query"}) ● problem: lots of data!
  • 6. Example // use aggregation to differentiate ops > db.system.profile.aggregate({ $group : { _id :"$op", count:{$sum:1}, "max response time":{$max:"$millis"}, "avg response time":{$avg:"$millis"} }}); { "result" : [ { "_id" : "command", "count" : 1, "max response time" : 0, "avg response time" : 0 }, { "_id" : "query", "count" : 12, "max response time" : 571, "avg response time" : 5 }, { "_id" : "update", "count" : 842, "max response time" : 111, "avg response time" : 40 }, { "_id" : "insert", "count" : 1633, "max response time" : 2, "avg response time" : 1 } ], "ok" : 1 } ● contrast how many of an item vs response time ● contrast average response time vs max ● prioritize op type
  • 7. Example // use aggregation to differentiate collections >db.system.profile.aggregate( {$group : { _id :"$ns", count:{$sum:1}, "max response time":{$max:"$millis"}, "avg response time":{$avg:"$millis"} }}, {$sort: { "max response time":-1}} ); { "result" : [ { "_id" : "game.players","count" : 787, "max response time" : 111, "avg response time" : 0}, {"_id" : "game.games","count" : 1681,"max response time" : 71, "avg response time" : 60}, {"_id" : "game.events","count" : 841,"max response time" : 1,"avg response time" : 0}, .... ], "ok" : 1 } ● keep this data over time! ● contrast how many of an item vs response time ● contrast average response time vs max ● more examples: https://gist.github. com/995a3aa5b35e92e5ab57
  • 8. Profiler Attributes ● fastMod ○ Good! Fastest possible update. In-place atomic operator ($inc,$set) ● nretunred vs nscanned ○ If nscanned != nscannedObjects, you may have opportunity to tune. ○ Add index ● key updates ○ Secondary indexes. Minimize them ○ 10% reduction in performance for each secondary index ● moved ○ Documents grow > padding factor ○ You can't fix it other than to pad yourself manually ○ Has to update indexes too! ○ db.collection.stats() shows padding ○ https://jira.mongodb.org/browse/SERVER-1810 <-- vote for me! ○ ^---- 2.3.1+ usePowerOf2Sizes
  • 9. Example { "ts" : ISODate("2012-09-14T16:34:00.010Z"), // date it occurred "op" : "query", // the operation type "ns" : "game.players", // the db and collection "query" : { "total_games" : 1000 }, // query document "ntoreturn" : 0, // # docs returned "ntoskip" : 0, "nscanned" : 959967, // number of docs scanned "keyUpdates" : 0, "numYield" : 1, "lockStats" : { ... }, "nreturned" : 0, // # docs actually returned "responseLength" : 20, // size of doc "millis" : 859, // how long it took "client" : "127.0.0.1", // client asked for it "user" : "" // the user asking for it }
  • 10. Example { "ts" : ISODate("2012-09-12T18:13:25.508Z"), "op" : "update", // this is an update "ns" : "game.players", "query" : {"_id" : { "$in" : [ 37013, 13355 ] } }, // the query for the update "updateobj" : { "$inc" : { "games_started" : 1 }}, // the update being performed "nscanned" : 1, "moved" : true, // document is moved "nmoved" : 1, "nupdated" : 1, "keyUpdates" : 0, // at least no secondary indexes "numYield" : 0, "lockStats" : { "timeLockedMicros" : { "r" : NumberLong(0),"w" : NumberLong(206)}, "timeAcquiringMicros" : {"r" : NumberLong(0),"w" : NumberLong(163)}}, "millis" : 0, "client" : "127.0.0.1", "user" : "" }
  • 11. Example { "ts" : ISODate("2012-09-12T18:13:26.562Z"), "op" : "update", "ns" : "game.players", "query" : {"_id" : { "$in" : [ 27258, 4904 ] } }, "updateobj" : { "$inc" : { "games_started" : 1}}, "nscanned" : 40002, // opportunity "moved" : true, // opportunity "nmoved" : 1, "nupdated" : 1, "keyUpdates" : 2, // opportunity "numYield" : 0, ....
  • 12. Statement Tuning ● Take any query when you build your app, explain it before you commit! ● Take profiler data, use explain() to tune queries. ○ Use prioritized list you built from profiler ○ Copy/paste into explain() ● Runs query when you call it, reports the plan it used to fulfill the statement ○ use limit(x) if it's really huge ● Attributes of interest: ○ nscanned vs nscannedObjects ○ nYields ○ covered indexes; what is this? ○ data locality ( + covered indexes FTFW ) ● Sharding has extra data in explain() output ○ Shards attribute ■ How many Shards did you visit? ■ Look at each shard, they can differ! Some get hot. ■ Pick good keys or you will pay
  • 13. Example > db.games.find({ "players" : 32071 }).explain() { "cursor" : "BtreeCursor players_1", "isMultiKey" : true, // multikey type indexed array "n" : 1, // 1 doc "nscannedObjects" : 1, "nscanned" : 1, // visited index "nscannedObjectsAllPlans" : 1, "nscannedAllPlans" : 1, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 0, // didn't have to yield "nChunkSkips" : 0, "millis" : 2, // fast "indexBounds" : {"players" : [ [ 32071, 32071 ] ] }, // good, used index }
  • 14. Example // index only query >db.events.find({ "user_id":35891},{"_id":0,"user_id":1}).explain() { "cursor" : "BtreeCursor user_id_1", "isMultiKey" : false, "n" : 2, // number of docs "nscannedObjects" : 2, "nscanned" : 2, "nscannedObjectsAllPlans" : 2, "nscannedAllPlans" : 2, "scanAndOrder" : false, // if sorting, can index be used? "indexOnly" : true, // Index only query "nYields" : 0, "nChunkSkips" : 0, "millis" : 0, "indexBounds" : { "user_id" : [ [ 35891, 35891 ] ] }, }
  • 15. bad! Data locality query: db.mytest.find({"user_id":10}).count() = 3 good! document; user_id:10 data block ● No index organized collections... so... ● Control process that inserts the data (queue/etc) ● Perform reorgs (.sort()) on slaves then promote ● Schema design ● Bad data locality plus a cache miss are asking for trouble ● Update+Move reduce good data locality (very likely) ● Indexes naturally have good data locality!
  • 16. Example; Data Locality > var arr=db.events.find( {"user_id":35891}, {'$diskLoc':1, 'user_id':1}).limit(20).showDiskLoc() > for(var i=0; i<arr.length(); i++) { var b=Math.round(arr[i].$diskLoc.offset/512); printjson(arr[i].user_id+" "+b); } "35891 354" "35891 55674" // what is this stuff? examples at: https://gist.github.com/977336
  • 17. Instance tuning; Write performance ● Overall system performance function of write performance ● Partition systems, functional split first. Group by common workloads. ● Writes ○ Tune your writes! ■ fastMods where we can ■ Turn updates into inserts? ■ Secondary indexes checked? ○ Single writer lock in mongodb ■ Modified in 2.0+ for yield on fault ■ Modified in 2.2+ for lock scope per DB ■ All databases mutex; get over it. ■ Minimize time that writes take; you win ○ Lock %, write queues ○ Use bench.py to test your write performance (https://github. com/memsql/bench) ○ Write tuned I/O; Caches, SSD, etc ○ Sharding? Split then Shard ■ Balancer induces I/O and writes!
  • 18. Instance tuning; Read performance ● Overall system performance function of write performance ● Reads scale well as long as writes are tuned ● Partition systems, split first. Group by common workloads. ● Reads scale nicely, especially against slaves ○ inconsistency OK? ○ Know your workload! ● Statements tuned ○ Using indexes ○ Covered indexes ○ Data locality ● Sharding ○ See how I mentioned that last?