SlideShare a Scribd company logo
Webinar: Index Tuning and Evaluation
2
Problem statement:
I inherited a MongoDB database server with
60 collections and 100 or so indexes.
The business users are complaining about
slow report completion times.
What can I do to improve performance ?
3
Scope:
System tuning-
Memory
Process
Disk
Network
Application tuning-
Application architecture
Statement design
Data model design
Indexing, Query optimization
(Relational, so that we may compare/contrast)
4
What you will
leave with in 60
minutes:
• Detail command processing stages
• Can apply the 5 rules to a Rule Based query optimizer
• Apply 3 Index Negation guidelines
• Repair common query design problems-
• Psuedo order by clause
• OR topped queries
• (And topped queries, non-compound)
• Drop and analyze query plans
• Articulate/control FJS, & ESR query processing
patterns
5
Queries 1 & 2:
SELECT * FROM phonebook
WHERE lastName LIKE “?son”;
SELECT * FROM phonebook
WHERE firstName = “Jennifer”;
6
Query 3:
CREATE TABLE t1 (col1, col_2, .. 80 more columns);
CREATE INDEX i1 ON t1 (col_gender);
SELECT * FROM t1
WHERE col_gender = “F” AND col_age > 50;
7
Negation of an index
1.Non-initial substring
2.Non-anchored compound/composite key
3.(Poor) selectivity of a filter
Partial negation of an index-
CREATE INDEX i1 ON t2 (col1, col2);
//
SELECT * FROM t1
WHERE col1 > 100 AND col2 = “x”;
Exception to all above-
Covered query/key-only
8
(n) Stage database server back end
9
Query Optimizers
Rule Cost ?
10
5 Rules to a rule based optimizer
1.Outer table joins
2.(Non-outer) table joins
3.Filter criteria (predicates)
4.Table size
5.Table cardinality
SELECT *
FROM orderHeader, orderLineItems
WHERE
oh.orderNumber =
oi.orderNumber;
SELECT *
FROM persons, OUTER automobiles
WHERE
p.personId = a.personId;
11
Query 4: final/larger example using SQL
12
Example 4:
query
13
Query 4: First predicate
14
Collection access method: collection scan versus index scan
15
Query 4: Join and predicate
16
Query 4: Optimal plan
17
FJS versus ESR: MongoDB
SELECT *
FROM collection
WHERE
col1 = ‘x’ and col2 > ‘y’
ORDER BY col3;
Filter -> Join -> Sort (FJS)
Equality -> Sort -> Range (ESR)
18
Problem statement:
I inherited a MongoDB database server with
60 collections and 100 or so indexes.
The business users are complaining about
slow report completion times.
What can I do to improve performance ?
19
Skills/tools we need:
• Which server
• Which logfile
• Server profiling Level
• Which queries
• Cloud/Ops Manager !!
• mtools
• Text processing
• Drop the query plan
• Analyze the query plan
• Which indexes get used
• Other
20
Sample data set: zips.json
> db.zips.findOne()
{
"_id" : ObjectId("570 .. 1c1f2"),
"city" : "ACMAR",
"zip" : "35004",
"loc" : {
"y" : 33.584132,
"x" : 86.51557
},
"pop" : 6055,
"state" : "AL"
}
> db.zips.count()
29470
>db.zips.find( { "state" : "WI", "pop" : { "$lt"
: 50 } } ).sort( { "city" : 1 } )
21
Query 5: Dumping the query plan
db.zips.find( { "state" : "WI",
"pop" : { "$lt" : 50 } } ).sort(
{ "city" : 1 }
).explain("executionStats")
"winningPlan" : {
"stage" : "SORT", "sortPattern" : { "city" : 1 },
"inputStage" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [ "state" : { "$eq" : "WI" "pop" : { "$lt" : 50
"rejectedPlans" : [ ]
"executionStats" : {
"nReturned" : 4,
"executionTimeMillis" : 16,
"totalKeysExamined" : 0,
"totalDocsExamined" : 29470,
22
Query 5: get indexes
> db.zips.getIndexes()
[ {
"v" : 1, "key" : { "_id" : 1 },
"name" : "_id_",
"ns" : "test_db.zips"
} ]
db.zips.createIndex( { "state" : 1 , "pop" : 1 } )
23
Query 5: attempt 2
winningPlan" : {
"stage" : "SORT", "sortPattern" : { "city" : 1 },
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : { "state" : 1, "pop" : 1 },
"indexBounds" : {
"state" : [ "["WI", "WI"]" ],
"pop" : [ "[-inf.0, 50.0)" ]
"executionStats" : {
"nReturned" : 4,
"executionTimeMillis" : 1,
"totalKeysExamined" : 4,
"totalDocsExamined" : 4,
24
Query 5:
attempt 3
db.zips.createIndex( { "state" : 1 , "city" : 1 , "pop" : 1 } )
"winningPlan" : {
"stage" : "SORT", "sortPattern" : { "city" : 1
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : { "state" : 1, "pop" : 1
"rejectedPlans" : [
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : { "state" : 1, "city" : 1, "pop" : 1 },
"indexBounds" : {
"state" : [ "["WI", "WI"]" ],
"city" : [ "[MinKey, MaxKey]" ],
"pop" : [ "[-inf.0, 50.0)" ]
25
Query 6: (pseudo order by clause)
db.zips.find( { "state" : "CO" }
).sort( { "pop" : 1 } )
SELECT * FROM t1
WHERE col1 = ‘x’
ORDER BY col2;
SELECT * FROM t1
WHERE col1 = ‘x’
ORDER BY col1, col2;
SELECT * FROM t1
WHERE col1 = ‘x’
ORDER BY ‘x’, col2;
26
Query 6: query plan
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : { "state" : 1, "city" : 1, "pop" : 1 },
"indexBounds" : {
"state" : [ "["CO", "CO"]" ],
"city" : [ "[MinKey, MaxKey]" ],
"pop" : [ "[MinKey, MaxKey]" ]
"executionStats" : {
"nReturned" : 416,
"executionTimeMillis" : 1,
"totalKeysExamined" : 416,
"totalDocsExamined" : 416,
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : { "state" : 1, "pop" : 1 },
"indexBounds" : {
"state" : [ "["CO", "CO"]" ],
"pop" : [ "[MinKey, MaxKey]" ]
"rejectedPlans" : [
"stage" : "SORT",
"sortPattern" : { "pop" : 1 },
"inputStage" : {
27
Review the indexes we have so far
> db.zips.getIndexes()
_id
db.zips.createIndex( { "state" : 1 , "pop" : 1 } )
db.zips.createIndex( { "state" : 1 , "city" : 1 , "pop" : 1 } )
28
Query 7: OR topped query
db.zips.find( { "$or" : [ { "state" : "UT" }, { "pop" : 2 } ] } )
"winningPlan" : {
"inputStage" : {
"stage" : "COLLSCAN",
"filter" : {
"$or" : [
"pop" : { "$eq" : 2 }
"state" : { "$eq" : "UT" }
"rejectedPlans" : [ ]
"executionStats" : {
"nReturned" : 215,
"executionTimeMillis" : 22,
"totalKeysExamined" : 0,
"totalDocsExamined" : 29470,
SELECT * FROM t1
WHERE
order_date = TODAY
OR
ship_weight < 10;
29
Query 7: solution
"stage" : "IXSCAN",
"keyPattern" : { "pop" : 1 },
"indexBounds" : {
"pop" : [ "[2.0, 2.0]" ]
"rejectedPlans" : [ ]
"executionStats" : {
"nReturned" : 215,
"executionTimeMillis" : 2,
"totalKeysExamined" : 215,
"totalDocsExamined" : 215,
db.zips.createIndex( { "pop" : 1 } )
"winningPlan" : {
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "OR",
"inputStages" : [
"stage" : "IXSCAN",
"keyPattern" : { "state" : 1, "pop" : 1 },
"indexBounds" : {
"state" : [ "["UT", "UT"]" ],
"pop" : [ "[MinKey, MaxKey]" ]
30
Topics not previously
covered
• How to tell which indexes are being used
• How to tell if an index is unique
• Smoke tests
• Covered queries
• MongoDB index types
• When do winning query plans get evacuated
• Index intersection
• Building indexes (online/offline)
• Sharding and queries, query plans
• Capped collections, tailable cursors
• Optimizer hints
• Memory limits
• Query rewrite (aggregation pipeline optimization)
• Which server
• Which logfile
• (Server profiling Level)
• Which queries
• mtools
• Text processing
31
Resources:
• The parent to this preso,
https://github.com/farrell0/MongoDB-Developers-Notebook
• An excellent query primer (110 pages)
http://www.redbooks.ibm.com/abstracts/sg247138.html?Open
(Chapters 10 and 11.)
• University.MongoDB.com
• https://www.mongodb.com/consulting-test#performance_evaluation
• zips.json
http://media.mongodb.org/zips.json
• Call Dave Lutz, at home, .. .. On Sunday (early)
(512)555/1212
32
Backup Slides
How to tell which indexes are being used
db.zips.aggregate( [ { "$indexStats" : {} } ] ).pretty()
{ "name" : "pop_1",
"key" : { "pop" : 1 },
"host" : "rhhost00.grid:27017",
"accesses" : {
"ops" : NumberLong(15),
"since" : ISODate("2016-04-19T07:13:44.546Z") } }
{ "name" : "state_1_city_1_pop_1",
"key" : { "state" : 1, "city" : 1, "pop" : 1 },
"host" : "rhhost00.grid:27017",
"accesses" : {
"ops" : NumberLong(0),
"since" : ISODate("2016-04-19T06:49:11.765Z") } }
How to tell if an index is unique
db.t1.createIndex( { "k1" : 1 },
{ "unique" : true })
{
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
}
> db.t1.getIndexes()
[ { "v" : 1,
"key" : { "_id" : 1 },
"name" : "_id_",
"ns" : "test_db.t1“ },
{ "v" : 1,
"unique" : true,
"key" : { "k1" : 1 },
"name" : "k1_1",
"ns" : "test_db.t1“ }
]
Smoke tests
Every night, gather a set of statistics about your hard disk fullness,
and about the performance of a set of queries that are strategic to
the application.
For queries we wish to record-
• The number of documents returned
• The winning query plan
• Elapsed time, disk and memory consumed
• Other
Covered queries
> db.zips.find( { "pop" : { "$lt" : 200 } },
{ "_id" : 0, "pop" : 1 } ).sort(
{ "pop" : -1 } ).explain()
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"_id" : 0,
"pop" : 1 },
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : { "pop" : 1 },
"indexBounds" : {
"pop" : [ "(200.0, -inf.0]" ]
"rejectedPlans" : [ ]
}
When does the winning plan get evacuated
In short, the cached query plan is re-evaluated if:
• The collection receives 1000 or more writes
• An index is added or dropped
• A reindex operation is performed
• mongod is restarted
• You run a query with explain
Index intersection
db.zips.find( { "$or" : [ { "state" : "UT" }, { "pop" : 2 } ] } )
db.zips.find( { "city" : "EAST TROY", "zip" : 53120 } )
Building indexes
db.zips.createIndex( { “zip” : 1 }, { “background” : true } )
Capped collections
db.createCollection("my_collection",
{ capped : true, size : 5242880,
max : 5000 } )
from pymongo import Connection
import time
db = Connection().my_db
coll = db.my_collection
cursor = coll.find(tailable=True)
while cursor.alive:
try:
doc = cursor.next()
print doc
except StopIteration:
time.sleep(1)
Memory limits: 32 MB, 100 MB
"executionStages" : {
"stage" : "SORT",
"nReturned" : 1,
"executionTimeMillisEstimate" : 60,
…
"sortPattern" : { "City" : 1 },
"memUsage" : 120,
"memLimit" : 33554432,
db.zips.aggregate([
{ "$group" :
{ "_id" : "$state",
//
"totalPop" : { "$sum" : "$pop" },
"cityCount" : { "$sum" : 1 }
}
} ,
{ "$sort" : { "_id" : 1 } }
],
{ "allowDiskUse" : true }
)
42
mtools: mplotquery
43
But first:
Y/N I have more than 24 months
experience with SQL
Y/N I have more than 6 months
experience with MongoDB
Y/N I have dropped a MongoDB
explain plan, understood it,
made changes, and was happy
Y/N Puppies scare me
44
Two more examples: Queries 8 and 9
find() aggregate()
• optimizer hints
• $lookup()
45
Query 8: automatic query rewrite
> db.zips.aggregate(
... [
... { "$sort" :
... { "state" : 1 }
... },
... {
... "$match" :
... { "state" : { "$gt" : "M" } }
... }
... ],
... { "explain" : true } )
46
Query 8: Explain plan
"stages" : [
"$cursor" : {
"sort" : { "state" : 1 },
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"indexName" : "state_1_city_1",
"indexBounds" : {
"state" : [ "("M", {})" ],
"city" : [ "[MinKey, MaxKey]" ]
"rejectedPlans" : [ ]
47
Query 9: Optimizer hints
> db.zips.find( { "city" : "EAST TROY" }).hint(
{ "zip" : 1} ).explain("executionStats")
"winningPlan" : {
"stage" : "FETCH",
"filter" : { "city" : { "$eq" : "EAST TROY" } },
"inputStage" : {
"stage" : "IXSCAN", "keyPattern" : { "zip" : 1 },
"indexBounds" : { "zip" : [ "[MinKey, MaxKey]" ] }
"rejectedPlans" : [ ]
"executionStats" : {
"nReturned" : 1,
"executionTimeMillis" : 28,
"totalKeysExamined" : 29470,
"totalDocsExamined" : 29470,

More Related Content

What's hot (20)

PPTX
Indexing Strategies to Help You Scale
MongoDB
 
ODP
2011 Mongo FR - Indexing in MongoDB
antoinegirbal
 
PPT
Introduction to MongoDB
antoinegirbal
 
PDF
Indexing and Performance Tuning
MongoDB
 
PPTX
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
PPTX
MongoDB-SESSION03
Jainul Musani
 
PDF
Indexing
Mike Dirolf
 
PPTX
MongoDB - Aggregation Pipeline
Jason Terpko
 
PDF
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
PDF
Introduction to solr
Sematext Group, Inc.
 
PPTX
Getting Started with MongoDB and NodeJS
MongoDB
 
PPT
Introduction to MongoDB
Nosh Petigara
 
PPTX
Indexing & Query Optimization
MongoDB
 
PPTX
Indexing and Query Optimization
MongoDB
 
PPTX
Indexing and Query Optimizer (Aaron Staple)
MongoSF
 
PPTX
Reducing Development Time with MongoDB vs. SQL
MongoDB
 
PDF
Webinar: Building Your First App with MongoDB and Java
MongoDB
 
PPTX
MongoDB Aggregation
Amit Ghosh
 
PPTX
Choosing a Shard key
MongoDB
 
PPTX
MongoDB + Java - Everything you need to know
Norberto Leite
 
Indexing Strategies to Help You Scale
MongoDB
 
2011 Mongo FR - Indexing in MongoDB
antoinegirbal
 
Introduction to MongoDB
antoinegirbal
 
Indexing and Performance Tuning
MongoDB
 
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB
 
MongoDB-SESSION03
Jainul Musani
 
Indexing
Mike Dirolf
 
MongoDB - Aggregation Pipeline
Jason Terpko
 
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB
 
Introduction to solr
Sematext Group, Inc.
 
Getting Started with MongoDB and NodeJS
MongoDB
 
Introduction to MongoDB
Nosh Petigara
 
Indexing & Query Optimization
MongoDB
 
Indexing and Query Optimization
MongoDB
 
Indexing and Query Optimizer (Aaron Staple)
MongoSF
 
Reducing Development Time with MongoDB vs. SQL
MongoDB
 
Webinar: Building Your First App with MongoDB and Java
MongoDB
 
MongoDB Aggregation
Amit Ghosh
 
Choosing a Shard key
MongoDB
 
MongoDB + Java - Everything you need to know
Norberto Leite
 

Similar to Webinar: Index Tuning and Evaluation (20)

PDF
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Henrik Ingo
 
PPTX
Webinar: What's new in the .NET Driver
MongoDB
 
PDF
Introduction to-mongo db-execution-plan-optimizer-final
M Malai
 
PDF
Introduction to Mongodb execution plan and optimizer
Mydbops
 
PPTX
High Performance, Scalable MongoDB in a Bare Metal Cloud
MongoDB
 
PDF
JSLT: JSON querying and transformation
Lars Marius Garshol
 
PPTX
Mug17 gurgaon
Ankur Raina
 
PDF
Storage Methods for Nonstandard Data Patterns
Bob Burgess
 
PDF
Write Faster SQL with Trino.pdf
Eric Xiao
 
PPT
9b. Document-Oriented Databases lab
Fabio Fumarola
 
PDF
Mongo indexes
Mehmet Çetin
 
PPTX
1403 app dev series - session 5 - analytics
MongoDB
 
PPTX
Beyond the Basics 2: Aggregation Framework
MongoDB
 
PDF
Data Processing and Aggregation with MongoDB
MongoDB
 
PPTX
ElasticSearch AJUG 2013
Roy Russo
 
PDF
MySQL Performance Monitoring
spil-engineering
 
PPTX
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
PDF
Advanced Relevancy Ranking
Search Technologies
 
PDF
Advanced query parsing techniques
lucenerevolution
 
PPTX
Scaling MongoDB
MongoDB
 
Whats new in mongoDB 2.4 at Copenhagen user group 2013-06-19
Henrik Ingo
 
Webinar: What's new in the .NET Driver
MongoDB
 
Introduction to-mongo db-execution-plan-optimizer-final
M Malai
 
Introduction to Mongodb execution plan and optimizer
Mydbops
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
MongoDB
 
JSLT: JSON querying and transformation
Lars Marius Garshol
 
Mug17 gurgaon
Ankur Raina
 
Storage Methods for Nonstandard Data Patterns
Bob Burgess
 
Write Faster SQL with Trino.pdf
Eric Xiao
 
9b. Document-Oriented Databases lab
Fabio Fumarola
 
Mongo indexes
Mehmet Çetin
 
1403 app dev series - session 5 - analytics
MongoDB
 
Beyond the Basics 2: Aggregation Framework
MongoDB
 
Data Processing and Aggregation with MongoDB
MongoDB
 
ElasticSearch AJUG 2013
Roy Russo
 
MySQL Performance Monitoring
spil-engineering
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
Advanced Relevancy Ranking
Search Technologies
 
Advanced query parsing techniques
lucenerevolution
 
Scaling MongoDB
MongoDB
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
Ad

Recently uploaded (20)

PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
introduction to computer hardware and sofeware
chauhanshraddha2007
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
introduction to computer hardware and sofeware
chauhanshraddha2007
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
The Future of Artificial Intelligence (AI)
Mukul
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Simple and concise overview about Quantum computing..pptx
mughal641
 

Webinar: Index Tuning and Evaluation

  • 2. 2 Problem statement: I inherited a MongoDB database server with 60 collections and 100 or so indexes. The business users are complaining about slow report completion times. What can I do to improve performance ?
  • 3. 3 Scope: System tuning- Memory Process Disk Network Application tuning- Application architecture Statement design Data model design Indexing, Query optimization (Relational, so that we may compare/contrast)
  • 4. 4 What you will leave with in 60 minutes: • Detail command processing stages • Can apply the 5 rules to a Rule Based query optimizer • Apply 3 Index Negation guidelines • Repair common query design problems- • Psuedo order by clause • OR topped queries • (And topped queries, non-compound) • Drop and analyze query plans • Articulate/control FJS, & ESR query processing patterns
  • 5. 5 Queries 1 & 2: SELECT * FROM phonebook WHERE lastName LIKE “?son”; SELECT * FROM phonebook WHERE firstName = “Jennifer”;
  • 6. 6 Query 3: CREATE TABLE t1 (col1, col_2, .. 80 more columns); CREATE INDEX i1 ON t1 (col_gender); SELECT * FROM t1 WHERE col_gender = “F” AND col_age > 50;
  • 7. 7 Negation of an index 1.Non-initial substring 2.Non-anchored compound/composite key 3.(Poor) selectivity of a filter Partial negation of an index- CREATE INDEX i1 ON t2 (col1, col2); // SELECT * FROM t1 WHERE col1 > 100 AND col2 = “x”; Exception to all above- Covered query/key-only
  • 8. 8 (n) Stage database server back end
  • 10. 10 5 Rules to a rule based optimizer 1.Outer table joins 2.(Non-outer) table joins 3.Filter criteria (predicates) 4.Table size 5.Table cardinality SELECT * FROM orderHeader, orderLineItems WHERE oh.orderNumber = oi.orderNumber; SELECT * FROM persons, OUTER automobiles WHERE p.personId = a.personId;
  • 11. 11 Query 4: final/larger example using SQL
  • 13. 13 Query 4: First predicate
  • 14. 14 Collection access method: collection scan versus index scan
  • 15. 15 Query 4: Join and predicate
  • 17. 17 FJS versus ESR: MongoDB SELECT * FROM collection WHERE col1 = ‘x’ and col2 > ‘y’ ORDER BY col3; Filter -> Join -> Sort (FJS) Equality -> Sort -> Range (ESR)
  • 18. 18 Problem statement: I inherited a MongoDB database server with 60 collections and 100 or so indexes. The business users are complaining about slow report completion times. What can I do to improve performance ?
  • 19. 19 Skills/tools we need: • Which server • Which logfile • Server profiling Level • Which queries • Cloud/Ops Manager !! • mtools • Text processing • Drop the query plan • Analyze the query plan • Which indexes get used • Other
  • 20. 20 Sample data set: zips.json > db.zips.findOne() { "_id" : ObjectId("570 .. 1c1f2"), "city" : "ACMAR", "zip" : "35004", "loc" : { "y" : 33.584132, "x" : 86.51557 }, "pop" : 6055, "state" : "AL" } > db.zips.count() 29470 >db.zips.find( { "state" : "WI", "pop" : { "$lt" : 50 } } ).sort( { "city" : 1 } )
  • 21. 21 Query 5: Dumping the query plan db.zips.find( { "state" : "WI", "pop" : { "$lt" : 50 } } ).sort( { "city" : 1 } ).explain("executionStats") "winningPlan" : { "stage" : "SORT", "sortPattern" : { "city" : 1 }, "inputStage" : { "stage" : "COLLSCAN", "filter" : { "$and" : [ "state" : { "$eq" : "WI" "pop" : { "$lt" : 50 "rejectedPlans" : [ ] "executionStats" : { "nReturned" : 4, "executionTimeMillis" : 16, "totalKeysExamined" : 0, "totalDocsExamined" : 29470,
  • 22. 22 Query 5: get indexes > db.zips.getIndexes() [ { "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "test_db.zips" } ] db.zips.createIndex( { "state" : 1 , "pop" : 1 } )
  • 23. 23 Query 5: attempt 2 winningPlan" : { "stage" : "SORT", "sortPattern" : { "city" : 1 }, "inputStage" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "state" : 1, "pop" : 1 }, "indexBounds" : { "state" : [ "["WI", "WI"]" ], "pop" : [ "[-inf.0, 50.0)" ] "executionStats" : { "nReturned" : 4, "executionTimeMillis" : 1, "totalKeysExamined" : 4, "totalDocsExamined" : 4,
  • 24. 24 Query 5: attempt 3 db.zips.createIndex( { "state" : 1 , "city" : 1 , "pop" : 1 } ) "winningPlan" : { "stage" : "SORT", "sortPattern" : { "city" : 1 "inputStage" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "state" : 1, "pop" : 1 "rejectedPlans" : [ "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "state" : 1, "city" : 1, "pop" : 1 }, "indexBounds" : { "state" : [ "["WI", "WI"]" ], "city" : [ "[MinKey, MaxKey]" ], "pop" : [ "[-inf.0, 50.0)" ]
  • 25. 25 Query 6: (pseudo order by clause) db.zips.find( { "state" : "CO" } ).sort( { "pop" : 1 } ) SELECT * FROM t1 WHERE col1 = ‘x’ ORDER BY col2; SELECT * FROM t1 WHERE col1 = ‘x’ ORDER BY col1, col2; SELECT * FROM t1 WHERE col1 = ‘x’ ORDER BY ‘x’, col2;
  • 26. 26 Query 6: query plan "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "state" : 1, "city" : 1, "pop" : 1 }, "indexBounds" : { "state" : [ "["CO", "CO"]" ], "city" : [ "[MinKey, MaxKey]" ], "pop" : [ "[MinKey, MaxKey]" ] "executionStats" : { "nReturned" : 416, "executionTimeMillis" : 1, "totalKeysExamined" : 416, "totalDocsExamined" : 416, "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "state" : 1, "pop" : 1 }, "indexBounds" : { "state" : [ "["CO", "CO"]" ], "pop" : [ "[MinKey, MaxKey]" ] "rejectedPlans" : [ "stage" : "SORT", "sortPattern" : { "pop" : 1 }, "inputStage" : {
  • 27. 27 Review the indexes we have so far > db.zips.getIndexes() _id db.zips.createIndex( { "state" : 1 , "pop" : 1 } ) db.zips.createIndex( { "state" : 1 , "city" : 1 , "pop" : 1 } )
  • 28. 28 Query 7: OR topped query db.zips.find( { "$or" : [ { "state" : "UT" }, { "pop" : 2 } ] } ) "winningPlan" : { "inputStage" : { "stage" : "COLLSCAN", "filter" : { "$or" : [ "pop" : { "$eq" : 2 } "state" : { "$eq" : "UT" } "rejectedPlans" : [ ] "executionStats" : { "nReturned" : 215, "executionTimeMillis" : 22, "totalKeysExamined" : 0, "totalDocsExamined" : 29470, SELECT * FROM t1 WHERE order_date = TODAY OR ship_weight < 10;
  • 29. 29 Query 7: solution "stage" : "IXSCAN", "keyPattern" : { "pop" : 1 }, "indexBounds" : { "pop" : [ "[2.0, 2.0]" ] "rejectedPlans" : [ ] "executionStats" : { "nReturned" : 215, "executionTimeMillis" : 2, "totalKeysExamined" : 215, "totalDocsExamined" : 215, db.zips.createIndex( { "pop" : 1 } ) "winningPlan" : { "inputStage" : { "stage" : "FETCH", "inputStage" : { "stage" : "OR", "inputStages" : [ "stage" : "IXSCAN", "keyPattern" : { "state" : 1, "pop" : 1 }, "indexBounds" : { "state" : [ "["UT", "UT"]" ], "pop" : [ "[MinKey, MaxKey]" ]
  • 30. 30 Topics not previously covered • How to tell which indexes are being used • How to tell if an index is unique • Smoke tests • Covered queries • MongoDB index types • When do winning query plans get evacuated • Index intersection • Building indexes (online/offline) • Sharding and queries, query plans • Capped collections, tailable cursors • Optimizer hints • Memory limits • Query rewrite (aggregation pipeline optimization) • Which server • Which logfile • (Server profiling Level) • Which queries • mtools • Text processing
  • 31. 31 Resources: • The parent to this preso, https://github.com/farrell0/MongoDB-Developers-Notebook • An excellent query primer (110 pages) http://www.redbooks.ibm.com/abstracts/sg247138.html?Open (Chapters 10 and 11.) • University.MongoDB.com • https://www.mongodb.com/consulting-test#performance_evaluation • zips.json http://media.mongodb.org/zips.json • Call Dave Lutz, at home, .. .. On Sunday (early) (512)555/1212
  • 33. How to tell which indexes are being used db.zips.aggregate( [ { "$indexStats" : {} } ] ).pretty() { "name" : "pop_1", "key" : { "pop" : 1 }, "host" : "rhhost00.grid:27017", "accesses" : { "ops" : NumberLong(15), "since" : ISODate("2016-04-19T07:13:44.546Z") } } { "name" : "state_1_city_1_pop_1", "key" : { "state" : 1, "city" : 1, "pop" : 1 }, "host" : "rhhost00.grid:27017", "accesses" : { "ops" : NumberLong(0), "since" : ISODate("2016-04-19T06:49:11.765Z") } }
  • 34. How to tell if an index is unique db.t1.createIndex( { "k1" : 1 }, { "unique" : true }) { "createdCollectionAutomatically" : true, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } > db.t1.getIndexes() [ { "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "test_db.t1“ }, { "v" : 1, "unique" : true, "key" : { "k1" : 1 }, "name" : "k1_1", "ns" : "test_db.t1“ } ]
  • 35. Smoke tests Every night, gather a set of statistics about your hard disk fullness, and about the performance of a set of queries that are strategic to the application. For queries we wish to record- • The number of documents returned • The winning query plan • Elapsed time, disk and memory consumed • Other
  • 36. Covered queries > db.zips.find( { "pop" : { "$lt" : 200 } }, { "_id" : 0, "pop" : 1 } ).sort( { "pop" : -1 } ).explain() "winningPlan" : { "stage" : "PROJECTION", "transformBy" : { "_id" : 0, "pop" : 1 }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "pop" : 1 }, "indexBounds" : { "pop" : [ "(200.0, -inf.0]" ] "rejectedPlans" : [ ] }
  • 37. When does the winning plan get evacuated In short, the cached query plan is re-evaluated if: • The collection receives 1000 or more writes • An index is added or dropped • A reindex operation is performed • mongod is restarted • You run a query with explain
  • 38. Index intersection db.zips.find( { "$or" : [ { "state" : "UT" }, { "pop" : 2 } ] } ) db.zips.find( { "city" : "EAST TROY", "zip" : 53120 } )
  • 39. Building indexes db.zips.createIndex( { “zip” : 1 }, { “background” : true } )
  • 40. Capped collections db.createCollection("my_collection", { capped : true, size : 5242880, max : 5000 } ) from pymongo import Connection import time db = Connection().my_db coll = db.my_collection cursor = coll.find(tailable=True) while cursor.alive: try: doc = cursor.next() print doc except StopIteration: time.sleep(1)
  • 41. Memory limits: 32 MB, 100 MB "executionStages" : { "stage" : "SORT", "nReturned" : 1, "executionTimeMillisEstimate" : 60, … "sortPattern" : { "City" : 1 }, "memUsage" : 120, "memLimit" : 33554432, db.zips.aggregate([ { "$group" : { "_id" : "$state", // "totalPop" : { "$sum" : "$pop" }, "cityCount" : { "$sum" : 1 } } } , { "$sort" : { "_id" : 1 } } ], { "allowDiskUse" : true } )
  • 43. 43 But first: Y/N I have more than 24 months experience with SQL Y/N I have more than 6 months experience with MongoDB Y/N I have dropped a MongoDB explain plan, understood it, made changes, and was happy Y/N Puppies scare me
  • 44. 44 Two more examples: Queries 8 and 9 find() aggregate() • optimizer hints • $lookup()
  • 45. 45 Query 8: automatic query rewrite > db.zips.aggregate( ... [ ... { "$sort" : ... { "state" : 1 } ... }, ... { ... "$match" : ... { "state" : { "$gt" : "M" } } ... } ... ], ... { "explain" : true } )
  • 46. 46 Query 8: Explain plan "stages" : [ "$cursor" : { "sort" : { "state" : 1 }, "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "indexName" : "state_1_city_1", "indexBounds" : { "state" : [ "("M", {})" ], "city" : [ "[MinKey, MaxKey]" ] "rejectedPlans" : [ ]
  • 47. 47 Query 9: Optimizer hints > db.zips.find( { "city" : "EAST TROY" }).hint( { "zip" : 1} ).explain("executionStats") "winningPlan" : { "stage" : "FETCH", "filter" : { "city" : { "$eq" : "EAST TROY" } }, "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "zip" : 1 }, "indexBounds" : { "zip" : [ "[MinKey, MaxKey]" ] } "rejectedPlans" : [ ] "executionStats" : { "nReturned" : 1, "executionTimeMillis" : 28, "totalKeysExamined" : 29470, "totalDocsExamined" : 29470,