SlideShare a Scribd company logo
Optimizing Slow
Queries with Indexes
  and Creativity

        Chris Winslett
     chris@mongohq.com
My Background

•    For the past year, I’ve looked at
    MongoDB logs at least once every day.


•    We routinely answer the question “how
    can I improve performance?”
Who’s this talk for?
•    New to MongoDB

•    Seeing some slow operations, and need
    help debugging

•    Running database operations on a sizeable
    deploy

•    I have a MongoDB deployment, and I’ve
    hit a performance wall
What should you learn?
Know where to look on a running MongoDB
 to uncover slowness, and discuss solutions.

        MongoDB has performance
              “patterns”.
How to think about improving performance.

                  And . . .
Schema Design


Design with the end in mind.
First, a Simple One
                          Alerted due to high CPU


query getmore command     res faults     locked db   ar|aw   netIn netOut   conn        time
 129       4       7    126m      2    my_db:0.0%    3|0      27k   445k     42    15:36:54
  64       4       3    126m      0    my_db:0.0%    5|0      12k   379k     42    15:36:55
  65       7       8    126m      0    my_db:0.1%    3|0      15k   230k     42    15:36:56
  65       3       3    126m      1    my_db:0.0%    3|0      13k   170k     42    15:36:57
  66       1       6    126m      1    my_db:0.0%    0|0      14k   262k     42    15:36:58
  32       8       5    126m      0    my_db:0.0%    5|0       5k   445k     42    15:36:59




                            a truncated mongostat
log

[conn73454] query my_db.my_collection query: { $query: {
publisher: "US Weekly" }, orderby: { publishAt: -1 } }
ntoreturn:5 ntoskip:0 nscanned:33236 scanAndOrder:1
keyUpdates:0 numYields: 21 locks(micros) r:317266
nreturned:5 reslen:3127 178ms




                                                 Example 1
Solution
We are fixing this query
  { $query: { publisher: "US Weekly" }, orderby: { publishedAt: -1 } }



With this index
  db.my_collection.ensureIndex({“publisher”: 1, publishedAt: -1}, {background: true})



I would show you the logs, but now they are silent.



                                                                          Example 1
The Pattern

Inefficient Read Queries from in-memory
table scans cause high CPU load


Caused by not matching indexes to queries.



                                      Example 1
Example 2

query delete     res faults    locked db idx miss %   qr|qw   ar|aw   netIn netOut   conn
  25      6    346m      0 my_db:188.1%          0     0|0     0|2     25k    45k    117
  24      6    346m      0 my_db:188.6%          0     0|0     0|1     27k    44k    117
  24      6    346m      0 my_db:184.3%          0     0|0     0|1     21k    36k    117
  24      6    346m      0 my_db:190.9%          0     0|0     0|1     20k    33k    117
  19      4    346m      0 my_db:191.5%          0     0|0     0|0     21k    41k    117




                           a truncated mongostat
tail

[conn72593] remove my_db.my_collection
query: { status: "some chuck of text" }
keyUpdates:0 numYields: 15 locks(micros)
w:213415 210ms




                                   Example 2
Solution
This is the slow query
  db.my_collection.remove({status: “some chunk of text”})


With this index
  db.my_collection.ensureIndex({status: 1})




                                               Example 2
The Pattern

  Inefficient write queries cause high lock.



Caused by losing track of your indexes / queries.




                                               Example 2
Example 3
                              Alerted on high CPU

insert   query update delete getmore command faults locked % idx miss %   qr|qw
ar|aw
   *0      *0     *0     *0       0     1|0   1422        0          0     0|0    50|0

   *0       6     *0     *0       0     6|0    575        0          0     0|0    51|0
   *0       3     *0     *0       0     1|0   1047        0          0     0|0    50|0
   *0       2     *0     *0       0     3|0   1660        0          0     0|0    50|0




                              a truncated mongostat
tail
[initandlisten] connection accepted from ....
[conn4229724] authenticate: { authenticate: ....
[initandlisten] connection accepted from ....
[conn4229725] authenticate: { authenticate: .....
[conn4229717] query ..... 102ms
[conn4229725] query ..... 140ms




                     amazingly quiet
                                                    Example 3
currentOp
> db.currentOP()
{
       "inprog" : [
             {
                    "opid" : 66178716,
                    "lockType" : "read",
                    "secs_running" : 760,
                    "op" : "query",
                    "ns" : "my_db.my_collection",
                    "query" : {
keywords: $in: [“keyword1”, “keyword2”],
tags: $in: [“tags1”, “tags2”]
                    },
orderby: {
“created_at”: -1
},
                    "numYields" : 21
             }
]
}
                                                    Example 3
Solution
Return Stability to Database

    > db.currentOP().inprog.filter(function(row) {
    return row.secs_running > 100 && row.op == "query"
    }).forEach(function(row) {
    db.killOp(row.opid)
    })

Disable query, and refactor schema.


                                              Example 3
Refactoring




I have one word for you,
        “Schema”
Example 4

A map reduce has gradually run
     slower and slower.
Finding Offenders


Find the time of the slowest query of the day:
   grep '[0-9]{3,100}ms$' $MONGODB_LOG | awk '{print $NF}' | sort -n




                                                                    Example 4
Slowest Map Reduce
my_db.$cmd command: {
mapreduce: "my_collection",
map: function() {},
query: { $or: [
{ object.type: "this" },
{ object.type: "that" }
],
time: { $lt: new Date(1359025311290), $gt: new Date(1358420511290) },
object.ver: 1,
origin: "tnh"
},
out: "my_new_collection",
reduce: function(keys, vals) { ....}
} ntoreturn:1 keyUpdates:0 numYields: 32696 locks(micros)
W:143870 r:511858643 w:6279425 reslen:140 421185ms



                                                                        Example 4
Solution
Problem
   Query is slow because it has multiple multi-value operators: $or, $gte, and $lte

Solution
   Change schema to use an “hour_created” attribute:

   hour_created: “%Y-%m-%d %H”

   Create an index on “hour_created” with followed by “$or” values. Query
   using the new “hour_created.”




                                                                         Example 4
Words of caution


 2 / 4 solutions were to add an index.


New indexes as a solution scales poorly.
Sometimes . . .

It is best to do nothing, except add shards / add
                    hardware.

 Go back to the drawing board on the design.
Bad things happen to
      good databases?

•    ORMs

•    Manage your indexes and queries.

•    Constraints will set you free.
Road Map for
           Refactoring
•    Measure, measure, measure.

•     Find your slowest queries and determine
    if they can be indexed

•    Rephrase the problem you are solving by
    asking “How do I want to query by data?”
Thank you!


•   Questions?

•   E-mail me: chris@mongohq.com

More Related Content

What's hot (20)

PPTX
Indexing and Query Optimization
MongoDB
 
PPTX
Indexing & Query Optimization
MongoDB
 
PDF
MongoDB Performance Tuning
Puneet Behl
 
PPTX
Indexing with MongoDB
MongoDB
 
PPTX
Reducing Development Time with MongoDB vs. SQL
MongoDB
 
PDF
MongoDB World 2016: Deciphering .explain() Output
MongoDB
 
PDF
20110514 mongo dbチューニング
Yuichi Matsuo
 
PDF
はじめてのMongoDB
Takahiro Inoue
 
PPTX
MongoDB and Indexes - MUG Denver - 20160329
Douglas Duncan
 
PDF
MySQL flexible schema and JSON for Internet of Things
Alexander Rubin
 
PDF
10 Key MongoDB Performance Indicators
iammutex
 
PDF
Spock and Geb in Action
Christian Baranowski
 
PDF
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
PDF
BDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
Christian Baranowski
 
PPTX
Cassandra 2.2 & 3.0
Victor Coustenoble
 
PPTX
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
MongoDB
 
PPTX
MongoDB + Java - Everything you need to know
Norberto Leite
 
PDF
NoSQL @ CodeMash 2010
Ben Scofield
 
PPTX
GreenDao Introduction
Booch Lin
 
DOCX
WOTC_Import
Luther Quinn
 
Indexing and Query Optimization
MongoDB
 
Indexing & Query Optimization
MongoDB
 
MongoDB Performance Tuning
Puneet Behl
 
Indexing with MongoDB
MongoDB
 
Reducing Development Time with MongoDB vs. SQL
MongoDB
 
MongoDB World 2016: Deciphering .explain() Output
MongoDB
 
20110514 mongo dbチューニング
Yuichi Matsuo
 
はじめてのMongoDB
Takahiro Inoue
 
MongoDB and Indexes - MUG Denver - 20160329
Douglas Duncan
 
MySQL flexible schema and JSON for Internet of Things
Alexander Rubin
 
10 Key MongoDB Performance Indicators
iammutex
 
Spock and Geb in Action
Christian Baranowski
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
BDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
Christian Baranowski
 
Cassandra 2.2 & 3.0
Victor Coustenoble
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
MongoDB
 
MongoDB + Java - Everything you need to know
Norberto Leite
 
NoSQL @ CodeMash 2010
Ben Scofield
 
GreenDao Introduction
Booch Lin
 
WOTC_Import
Luther Quinn
 

Viewers also liked (6)

PDF
MongoDB and server performance
Alon Horev
 
PPTX
MongoDB's index and query optimize
mysqlops
 
PDF
MongoDB memory management demystified
Alon Horev
 
PDF
MongoDB at the energy frontier
Valentin Kuznetsov
 
PDF
MongoDB WiredTiger Internals
Norberto Leite
 
PDF
MongoDBのアレをアレする
Akihiro Kuwano
 
MongoDB and server performance
Alon Horev
 
MongoDB's index and query optimize
mysqlops
 
MongoDB memory management demystified
Alon Horev
 
MongoDB at the energy frontier
Valentin Kuznetsov
 
MongoDB WiredTiger Internals
Norberto Leite
 
MongoDBのアレをアレする
Akihiro Kuwano
 
Ad

Similar to Optimizing Slow Queries with Indexes and Creativity (20)

PDF
Mongo db improve the performance of your application codemotion2016
Juan Antonio Roy Couto
 
PPTX
Performance Tuning and Optimization
MongoDB
 
PDF
Mongodb in-anger-boston-rb-2011
bostonrb
 
PDF
Benchmarking at Parse
Travis Redman
 
PDF
Advanced Benchmarking at Parse
MongoDB
 
PDF
MongoDB: Optimising for Performance, Scale & Analytics
Server Density
 
PDF
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
MongoDB
 
PDF
Challenges with MongoDB
Stone Gao
 
PDF
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
Mydbops
 
PDF
Mongo nyc nyt + mongodb
Deep Kapadia
 
PDF
MongoDB Tokyo - Monitoring and Queueing
Boxed Ice
 
PPTX
Webinar: Performance Tuning + Optimization
MongoDB
 
PDF
Use Your MySQL Knowledge to Become a MongoDB Guru
Tim Callaghan
 
PPTX
Mongo db pefrormance optimization strategies
ronwarshawsky
 
PPTX
Whats new in MongoDB 24
MongoDB
 
PDF
Silicon Valley Code Camp 2016 - MongoDB in production
Daniel Coupal
 
PPTX
Mongodb Performance
Jack
 
PDF
Indexing and Query Performance in MongoDB.pdf
Malak Abu Hammad
 
PPTX
Webinar: Index Tuning and Evaluation
MongoDB
 
PPTX
How to Achieve Scale with MongoDB
MongoDB
 
Mongo db improve the performance of your application codemotion2016
Juan Antonio Roy Couto
 
Performance Tuning and Optimization
MongoDB
 
Mongodb in-anger-boston-rb-2011
bostonrb
 
Benchmarking at Parse
Travis Redman
 
Advanced Benchmarking at Parse
MongoDB
 
MongoDB: Optimising for Performance, Scale & Analytics
Server Density
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
MongoDB
 
Challenges with MongoDB
Stone Gao
 
Use Performance Insights To Enhance MongoDB Performance - (Manosh Malai - Myd...
Mydbops
 
Mongo nyc nyt + mongodb
Deep Kapadia
 
MongoDB Tokyo - Monitoring and Queueing
Boxed Ice
 
Webinar: Performance Tuning + Optimization
MongoDB
 
Use Your MySQL Knowledge to Become a MongoDB Guru
Tim Callaghan
 
Mongo db pefrormance optimization strategies
ronwarshawsky
 
Whats new in MongoDB 24
MongoDB
 
Silicon Valley Code Camp 2016 - MongoDB in production
Daniel Coupal
 
Mongodb Performance
Jack
 
Indexing and Query Performance in MongoDB.pdf
Malak Abu Hammad
 
Webinar: Index Tuning and Evaluation
MongoDB
 
How to Achieve Scale with MongoDB
MongoDB
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

Optimizing Slow Queries with Indexes and Creativity

  • 1. Optimizing Slow Queries with Indexes and Creativity Chris Winslett [email protected]
  • 2. My Background • For the past year, I’ve looked at MongoDB logs at least once every day. • We routinely answer the question “how can I improve performance?”
  • 3. Who’s this talk for? • New to MongoDB • Seeing some slow operations, and need help debugging • Running database operations on a sizeable deploy • I have a MongoDB deployment, and I’ve hit a performance wall
  • 4. What should you learn? Know where to look on a running MongoDB to uncover slowness, and discuss solutions. MongoDB has performance “patterns”. How to think about improving performance. And . . .
  • 5. Schema Design Design with the end in mind.
  • 6. First, a Simple One Alerted due to high CPU query getmore command res faults locked db ar|aw netIn netOut conn time 129 4 7 126m 2 my_db:0.0% 3|0 27k 445k 42 15:36:54 64 4 3 126m 0 my_db:0.0% 5|0 12k 379k 42 15:36:55 65 7 8 126m 0 my_db:0.1% 3|0 15k 230k 42 15:36:56 65 3 3 126m 1 my_db:0.0% 3|0 13k 170k 42 15:36:57 66 1 6 126m 1 my_db:0.0% 0|0 14k 262k 42 15:36:58 32 8 5 126m 0 my_db:0.0% 5|0 5k 445k 42 15:36:59 a truncated mongostat
  • 7. log [conn73454] query my_db.my_collection query: { $query: { publisher: "US Weekly" }, orderby: { publishAt: -1 } } ntoreturn:5 ntoskip:0 nscanned:33236 scanAndOrder:1 keyUpdates:0 numYields: 21 locks(micros) r:317266 nreturned:5 reslen:3127 178ms Example 1
  • 8. Solution We are fixing this query { $query: { publisher: "US Weekly" }, orderby: { publishedAt: -1 } } With this index db.my_collection.ensureIndex({“publisher”: 1, publishedAt: -1}, {background: true}) I would show you the logs, but now they are silent. Example 1
  • 9. The Pattern Inefficient Read Queries from in-memory table scans cause high CPU load Caused by not matching indexes to queries. Example 1
  • 10. Example 2 query delete res faults locked db idx miss % qr|qw ar|aw netIn netOut conn 25 6 346m 0 my_db:188.1% 0 0|0 0|2 25k 45k 117 24 6 346m 0 my_db:188.6% 0 0|0 0|1 27k 44k 117 24 6 346m 0 my_db:184.3% 0 0|0 0|1 21k 36k 117 24 6 346m 0 my_db:190.9% 0 0|0 0|1 20k 33k 117 19 4 346m 0 my_db:191.5% 0 0|0 0|0 21k 41k 117 a truncated mongostat
  • 11. tail [conn72593] remove my_db.my_collection query: { status: "some chuck of text" } keyUpdates:0 numYields: 15 locks(micros) w:213415 210ms Example 2
  • 12. Solution This is the slow query db.my_collection.remove({status: “some chunk of text”}) With this index db.my_collection.ensureIndex({status: 1}) Example 2
  • 13. The Pattern Inefficient write queries cause high lock. Caused by losing track of your indexes / queries. Example 2
  • 14. Example 3 Alerted on high CPU insert query update delete getmore command faults locked % idx miss % qr|qw ar|aw *0 *0 *0 *0 0 1|0 1422 0 0 0|0 50|0 *0 6 *0 *0 0 6|0 575 0 0 0|0 51|0 *0 3 *0 *0 0 1|0 1047 0 0 0|0 50|0 *0 2 *0 *0 0 3|0 1660 0 0 0|0 50|0 a truncated mongostat
  • 15. tail [initandlisten] connection accepted from .... [conn4229724] authenticate: { authenticate: .... [initandlisten] connection accepted from .... [conn4229725] authenticate: { authenticate: ..... [conn4229717] query ..... 102ms [conn4229725] query ..... 140ms amazingly quiet Example 3
  • 16. currentOp > db.currentOP() { "inprog" : [ { "opid" : 66178716, "lockType" : "read", "secs_running" : 760, "op" : "query", "ns" : "my_db.my_collection", "query" : { keywords: $in: [“keyword1”, “keyword2”], tags: $in: [“tags1”, “tags2”] }, orderby: { “created_at”: -1 }, "numYields" : 21 } ] } Example 3
  • 17. Solution Return Stability to Database > db.currentOP().inprog.filter(function(row) { return row.secs_running > 100 && row.op == "query" }).forEach(function(row) { db.killOp(row.opid) }) Disable query, and refactor schema. Example 3
  • 18. Refactoring I have one word for you, “Schema”
  • 19. Example 4 A map reduce has gradually run slower and slower.
  • 20. Finding Offenders Find the time of the slowest query of the day: grep '[0-9]{3,100}ms$' $MONGODB_LOG | awk '{print $NF}' | sort -n Example 4
  • 21. Slowest Map Reduce my_db.$cmd command: { mapreduce: "my_collection", map: function() {}, query: { $or: [ { object.type: "this" }, { object.type: "that" } ], time: { $lt: new Date(1359025311290), $gt: new Date(1358420511290) }, object.ver: 1, origin: "tnh" }, out: "my_new_collection", reduce: function(keys, vals) { ....} } ntoreturn:1 keyUpdates:0 numYields: 32696 locks(micros) W:143870 r:511858643 w:6279425 reslen:140 421185ms Example 4
  • 22. Solution Problem Query is slow because it has multiple multi-value operators: $or, $gte, and $lte Solution Change schema to use an “hour_created” attribute: hour_created: “%Y-%m-%d %H” Create an index on “hour_created” with followed by “$or” values. Query using the new “hour_created.” Example 4
  • 23. Words of caution 2 / 4 solutions were to add an index. New indexes as a solution scales poorly.
  • 24. Sometimes . . . It is best to do nothing, except add shards / add hardware. Go back to the drawing board on the design.
  • 25. Bad things happen to good databases? • ORMs • Manage your indexes and queries. • Constraints will set you free.
  • 26. Road Map for Refactoring • Measure, measure, measure. • Find your slowest queries and determine if they can be indexed • Rephrase the problem you are solving by asking “How do I want to query by data?”
  • 27. Thank you! • Questions? • E-mail me: [email protected]