SlideShare a Scribd company logo
2
Most read
17
Most read
24
Most read
Indexing withAaron Stapleaaron@10gen.com
What are indexes?References to your documents, efficiently ordered by keyMaintained in a tree structure, allowing fast lookup{x:1}{y:1}{x:0.5,y:0.5}{x:2,y:0.5}{x:5,y:2}{x:-4,y:10}{x:3,y:’f’}
Fast document lookup	db.c.findOne( {_id:2} ), using index {_id:1}db.c.find( {x:2} ), using index {x:1}db.c.find( {x:{$in:[2,3]}} ), using index {x:1}db.c.find( {‘x.a’:1} ), using index {‘x.a’:1}Matches {_id:1,x:{a:1}}db.c.find( {x:{a:1}} ), using index {x:1}Matches {_id:1,x:{a:1}}, but not {_id:2,x:{a:1,b:2}}QUESTION: What about db.c.find( {$where:“this.x == this.y”} ), using index {x:1}?Indexes cannot be used for $where type queries, but if there are non-where elements in the query then indexes can be used for the non-where elements.
Fast document range scandb.c.find( {x:{$gt:2}} ), using index {x:1}db.c.find( {x:{$gt:2,$lt:5}} ), using index {x:1}db.c.find( {x:/^a/} ), using index {x:1}QUESTION: What about db.c.find( {x:/a/} ), using index {x:1}?The letter ‘a’ can appear anywhere in a matching string, so lexicographic ordering on strings won’t help.  However, we can use the index to find the range of documents where x is string (eg not a number) or x is the regular expression /a/.
Other operationsdb.c.count( {x:2} ) using index {x:1}db.c.distinct( {x:2} ) using index {x:1}db.c.update( {x:2}, {x:3} ) using index {x:1}db.c.remove( {x:2} ) using index {x:1}QUESTION: What about db.c.update( {x:2}, {$inc:{x:3}} ), using index {x:1}?Older versions of mongoDB didn’t support modifiers on indexed fields, but we now support this.
Fast document orderingdb.c.find( {} ).sort( {x:1} ), using index {x:1}db.c.find( {} ).sort( {x:-1} ), using index {x:1}db.c.find( {x:{$gt:4}} ).sort( {x:-1} ), using index {x:1}db.c.find( {} ).sort( {‘x.a’:1} ), using index {‘x.a’:1}QUESTION: What about db.c.find( {y:1} ).sort( {x:1} ), using index {x:1}?The index will be used to ensure ordering, provided there is no better index.
Missing fieldsdb.c.find( {x:null} ), using index {x:1}Matches {_id:5}db.c.find( {x:{$exists:false}} ), using index {x:1}Matches {_id:5}, but not {_id:6,x:null}QUESTION: What about db.c.find( {x:{$exists:true}} ), using index {x:1}?The index is not currently used, though we may use the index in a future version of mongoDB.
Array matchingAll the following match {_id:6,x:[2,10]} and use index {x:1}db.c.find( {x:2} )db.c.find( {x:10} )db.c.find( {x:{$gt:5}} )db.c.find( {x:[2,10]} )db.c.find( {x:{$in:[2,5]}} )QUESTION: What about db.c.find( {x:{$all:[2,10]}} )?The index will be used to look up all documents matching {x:2}.
Compound Indexesdb.c.find( {x:10,y:20} ), using index {x:1,y:1}db.c.find( {x:10,y:20} ), using index {x:1,y:-1}db.c.find( {x:{$in:[10,20]},y:20} ), using index {x:1,y:1}db.c.find().sort( {x:1,y:1} ), using index {x:1,y:1}db.c.find().sort( {x:-1,y:1} ), using index {x:1,y:-1}db.c.find( {x:10} ).sort( {y:1} ), using index {x:1,y:1}QUESTION: What about db.c.find( {y:10} ).sort( {x:1} ), using index {x:1,y:1}?The index will be used to ensure ordering, provided no better index is available.
When indexes are less helpfuldb.c.find( {x:{$ne:1}} )db.c.find( {x:{$mod:[10,1]}} )Uses index {x:1} to scan numbers onlydb.c.find( {x:{$not:/a/}} )db.c.find( {x:{$gte:0,$lte:10},y:5} ) using index {x:1,y:1}Currently must scan all elements from {x:0,y:5} to {x:10,y:5}, but some improvements may be possibledb.c.find( {$where:’this.x = 5’} )QUESTION: What about db.c.find( {x:{$not:/^a/}} ), using index {x:1}?The index is not used currently, but will be used in mongoDB 1.6
Geospatial indexesdb.c.find( {a:[50,50]} ) using index {a:’2d’}db.c.find( {a:{$near:[50,50]}} ) using index {a:’2d’}Results are sorted closest - farthestdb.c.find( {a:{$within:{$box:[[40,40],[60,60]]}}} ) using index {a:’2d’}db.c.find( {a:{$within:{$center:[[50,50],10]}}} ) using index {a:’2d’}db.c.find( {a:{$near:[50,50]},b:2} ) using index {a:’2d’,b:1}QUESTION: Most queries can be performed with or without an index.  Is this true of geospatial queries?No.  A geospatial query requires an index.
Creating indexes{_id:1} index created automaticallyFor non-capped collectionsdb.c.ensureIndex( {x:1} )Can create an index at any time, even when you already have plenty of data in your collectionCreating an index will block mongoDB unless you specify background index creationdb.c.ensureIndex( {x:1}, {background:true} )Background index creation is a still impacts performance – run at non peak times if you’re concernedQUESTION: Can an index be removed during background creation?Not at this time.
Unique key constraintsdb.c.ensureIndex( {x:1}, {unique:true} )Don’t allow {_id:10,x:2} and {_id:11,x:2}Don’t allow {_id:12} and {_id:13} (both match {x:null}What if duplicates exist before index is created?Normally index creation fails and the index is removeddb.ensureIndex( {x:1}, {unique:true,dropDups:true} )QUESTION: In dropDups mode, which duplicates will be removed?The first document according to the collection’s “natural order” will be preserved.
Cleaning up indexesdb.system.indexes.find( {ns:’db.c’} )db.c.dropIndex( {x:1} )db.c.dropIndexes()db.c.reIndex()Rebuilds all indexes, removing index cruft that has built up over large numbers of updates and deletes.  Index cruft will not exist in mongoDB 1.6, so this command will be deprecated.QUESTION: Why would you want to drop an index?See next slide…
Limits and TradeoffsMax 40 indexes per collectionLogically equivalent indexes are not prevented (eg {x:1} and {x:-1})Indexes can improve speed of queries, but make inserts slowerMore specific indexes {a:1,b:1,c:1} can be more helpful than less specific indexes {a:1}, but sorting compound keys may not be as fast as sorting simple keysQUESTION: Do indexes make updates slower?  How about deletes?It depends – finding your document might be faster, but if any indexed fields are changed the indexes must be updated.
Query OptimizerIn charge of picking which index to use for a query/count/update/delete/etcImplementation is part of the magic of mongo (you can read about it online – not covering today)Usually it does a good job, but if you know what you’re doing you can override itdb.c.find( {x:2,y:3} ).hint( {y:1} )Use index {y:1} and avoid trying out {x:1}As your data changes, different indexes may be chosen.  Ordering requirements should be made explicit using sort().QUESTION: How can you force a full collection scan instead of using indexes?db.c.find( {x:2,y:3} ).hint( {$natural:1} )
Mongod log outputquery test.c ntoreturn:1 reslen:69 nscanned:100000 { i: 99999.0 }  nreturned:1 157msquery test.$cmd ntoreturn:1 command: { count: "c", query: { type: 0.0, i: { $gt: 99000.0 } }, fields: {} } reslen:64 256msquery:{ query: {}, orderby: { i: 1.0 } } ... query test.c ntoreturn:0 exception  1378ms ... User Exception 10128:too much key data for sort() with no index.  add an index or specify a smaller limitquery test.c ntoreturn:0 reslen:4783 nscanned:100501 { query: { type: 500.0 }, orderby: { i: 1.0 } }  nreturned:101 390msOccasionally may see a slow operation as a result of disk activity or mongo cleaning things up – some messages about slow ops are spuriousKeep this in mind when running the same op a massive number of times, and it appears slow very rarely
ProfilingRecord same info as with log messages, but in a database collection> db.system.profile.find(){"ts" : "Thu Jan 29 2009 15:19:32 GMT-0500 (EST)" , "info" : "query test.$cmd ntoreturn:1 reslen:66 nscanned:0  <br>query: { profile: 2 }  nreturned:1 bytes:50" , "millis" : 0}...> db.system.profile.find( { info: /test.foo/ } )> db.system.profile.find( { millis : { $gt : 5 } } )> db.system.profile.find().sort({$natural:-1})Enable explicitly using levels (0:off, 1:slow ops (>100ms), 2:all ops)> db.setProfilingLevel(2);{"was" : 0 , "ok" : 1}> db.getProfilingLevel()2> db.setProfilingLevel( 1 , 10 ); // slow means > 10msProfiling impacts performance, but not severely
Query explain> db.c.find( {x:1000,y:0} ).explain(){	"cursor" : "BtreeCursor x_1",	"indexBounds" : [		[			{				"x" : 1000			},			{				"x" : 1000			}		]	],	"nscanned" : 10,	"nscannedObjects" : 10,	"n" : 10,	"millis" : 0,	"oldPlan" : {		"cursor" : "BtreeCursor x_1",		"indexBounds" : [			[				{					"x" : 1000				},				{					"x" : 1000				}			]		]	},	"allPlans" : [		{			"cursor" : "BtreeCursor x_1",			"indexBounds" : [				[					{						"x" : 1000					},					{						"x" : 1000					}				]			]		},		{			"cursor" : "BtreeCursor y_1",			"indexBounds" : [				[					{						"y" : 0					},					{						"y" : 0					}				]			]		},		{			"cursor" : "BasicCursor",			"indexBounds" : [ ]		}	]}
Example 1> db.c.findOne( {i:99999} ){ "_id" : ObjectId("4bb962dddfdcf5761c1ec6a3"), "i" : 99999 }query test.c ntoreturn:1 reslen:69 nscanned:100000 { i: 99999.0 }  nreturned:1 157ms> db.c.find( {i:99999} ).limit(1).explain(){	"cursor" : "BasicCursor",	"indexBounds" : [ ],	"nscanned" : 100000,	"nscannedObjects" : 100000,	"n" : 1,	"millis" : 161,	"allPlans" : [		{			"cursor" : "BasicCursor",			"indexBounds" : [ ]		}	]}> db.c.ensureIndex( {i:1} );> for( i = 0; i < 100000; ++i ) { db.c.save( {i:i} ); }
Example 2> db.c.count( {type:0,i:{$gt:99000}} )499query test.$cmd ntoreturn:1 command: { count: "c", query: { type: 0.0, i: { $gt: 99000.0 } }, fields: {} } reslen:64 256ms> db.c.find( {type:0,i:{$gt:99000}} ).limit(1).explain(){	"cursor" : "BtreeCursor type_1",	"indexBounds" : [		[			{				"type" : 0			},			{				"type" : 0			}		]	],	"nscanned" : 49502,	"nscannedObjects" : 49502,	"n" : 1,	"millis" : 349,...> db.c.ensureIndex( {type:1,i:1} );> for( i = 0; i < 100000; ++i ) { db.c.save( {type:i%2,i:i} ); }
Example 3> db.c.find().sort( {i:1} )error: {	"$err" : "too much key data for sort() with no index.  add an index or specify a smaller limit"}> db.c.find().sort( {i:1} ).explain()JS Error: uncaught exception: error: {	"$err" : "too much key data for sort() with no index.  add an index or specify a smaller limit"}> db.c.ensureIndex( {i:1} );> for( i = 0; i < 1000000; ++i ) { db.c.save( {i:i} ); }
Example 4> db.c.find( {type:500} ).sort( {i:1} ){ "_id" : ObjectId("4bba4904dfdcf5761c2f917e"), "i" : 500, "type" : 500 }{ "_id" : ObjectId("4bba4904dfdcf5761c2f9566"), "i" : 1500, "type" : 500 }...query test.c ntoreturn:0 reslen:4783 nscanned:100501 { query: { type: 500.0 }, orderby: { i: 1.0 } }  nreturned:101 390ms> db.c.find( {type:500} ).sort( {i:1} ).explain(){	"cursor" : "BtreeCursor i_1",	"indexBounds" : [		[			{				"i" : {					"$minElement" : 1				}			},			{				"i" : {					"$maxElement" : 1				}			}		]	],	"nscanned" : 1000000,	"nscannedObjects" : 1000000,	"n" : 1000,	"millis" : 5388,...> db.c.ensureIndex( {type:1,i:1} );> for( i = 0; i < 1000000; ++i ) { db.c.save( {i:i,type:i%1000} ); }
Questions?Follow @mongodbGet involved www.mongodb.orgUpcoming events www.mongodb.org/display/DOCS/EventsMongoSF April 30SF office hours every Mon 4-6pm Epicenter CafeCommercial support www.10gen.comjobs@10gen.com

More Related Content

What's hot (20)

PDF
An introduction to MongoDB
Universidade de São Paulo
 
PPTX
Webinar: MongoDB Schema Design and Performance Implications
MongoDB
 
PPTX
Sharding Methods for MongoDB
MongoDB
 
PDF
Indexing and Performance Tuning
MongoDB
 
PDF
How to Design Indexes, Really
Karwin Software Solutions LLC
 
PPTX
Indexing & Query Optimization
MongoDB
 
PDF
An introduction to MongoDB
César Trigo
 
PPTX
Mongo DB Presentation
Jaya Naresh Kovela
 
PDF
Introduction to MongoDB
Mike Dirolf
 
PDF
MongoDB Database Replication
Mehdi Valikhani
 
PDF
Mongodb - Scaling write performance
Daum DNA
 
PDF
Working with JSON Data in PostgreSQL vs. MongoDB
ScaleGrid.io
 
PPTX
Mongo DB 성능최적화 전략
Jin wook
 
PPTX
Introduction to Sharding
MongoDB
 
PPTX
Postgresql
NexThoughts Technologies
 
PPT
Introduction to MongoDB
Ravi Teja
 
PPTX
Getting started with postgresql
botsplash.com
 
PDF
MongoDB performance
Mydbops
 
PDF
InnoDB Locking Explained with Stick Figures
Karwin Software Solutions LLC
 
PDF
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB
 
An introduction to MongoDB
Universidade de São Paulo
 
Webinar: MongoDB Schema Design and Performance Implications
MongoDB
 
Sharding Methods for MongoDB
MongoDB
 
Indexing and Performance Tuning
MongoDB
 
How to Design Indexes, Really
Karwin Software Solutions LLC
 
Indexing & Query Optimization
MongoDB
 
An introduction to MongoDB
César Trigo
 
Mongo DB Presentation
Jaya Naresh Kovela
 
Introduction to MongoDB
Mike Dirolf
 
MongoDB Database Replication
Mehdi Valikhani
 
Mongodb - Scaling write performance
Daum DNA
 
Working with JSON Data in PostgreSQL vs. MongoDB
ScaleGrid.io
 
Mongo DB 성능최적화 전략
Jin wook
 
Introduction to Sharding
MongoDB
 
Introduction to MongoDB
Ravi Teja
 
Getting started with postgresql
botsplash.com
 
MongoDB performance
Mydbops
 
InnoDB Locking Explained with Stick Figures
Karwin Software Solutions LLC
 
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB
 

Viewers also liked (20)

PPTX
Indexing and Query Optimizer (Aaron Staple)
MongoSF
 
PPTX
Breaking the oracle tie
agiamas
 
PPTX
MongoDB Schema Design: Four Real-World Examples
Mike Friedman
 
PPTX
MongoDB and Indexes - MUG Denver - 20160329
Douglas Duncan
 
PDF
Webinar: Working with Graph Data in MongoDB
MongoDB
 
KEY
Indexing with MongoDB
lehresman
 
PPT
MongoDB Schema Design
MongoDB
 
PDF
MongoDB Performance Tuning
MongoDB
 
PPTX
MongoDB for Time Series Data: Schema Design
MongoDB
 
PPT
The MEAN Stack: MongoDB, ExpressJS, AngularJS and Node.js
MongoDB
 
PDF
Webinar: 10-Step Guide to Creating a Single View of your Business
MongoDB
 
PPTX
Performance Tuning on the Fly at CMP.LY
MongoDB
 
PPTX
Webinar: Index Tuning and Evaluation
MongoDB
 
PPTX
User Data Management with MongoDB
MongoDB
 
KEY
MongoDBを使用したモバイルゲーム開発
Genki Yamada
 
PDF
Indexing and Query Optimizer (Richard Kreuter)
MongoDB
 
PDF
20110514 mongo dbチューニング
Yuichi Matsuo
 
PDF
Indexing
Mike Dirolf
 
PDF
MongoDB全機能解説1
Takahiro Inoue
 
PPTX
Webinar: Schema Design
MongoDB
 
Indexing and Query Optimizer (Aaron Staple)
MongoSF
 
Breaking the oracle tie
agiamas
 
MongoDB Schema Design: Four Real-World Examples
Mike Friedman
 
MongoDB and Indexes - MUG Denver - 20160329
Douglas Duncan
 
Webinar: Working with Graph Data in MongoDB
MongoDB
 
Indexing with MongoDB
lehresman
 
MongoDB Schema Design
MongoDB
 
MongoDB Performance Tuning
MongoDB
 
MongoDB for Time Series Data: Schema Design
MongoDB
 
The MEAN Stack: MongoDB, ExpressJS, AngularJS and Node.js
MongoDB
 
Webinar: 10-Step Guide to Creating a Single View of your Business
MongoDB
 
Performance Tuning on the Fly at CMP.LY
MongoDB
 
Webinar: Index Tuning and Evaluation
MongoDB
 
User Data Management with MongoDB
MongoDB
 
MongoDBを使用したモバイルゲーム開発
Genki Yamada
 
Indexing and Query Optimizer (Richard Kreuter)
MongoDB
 
20110514 mongo dbチューニング
Yuichi Matsuo
 
Indexing
Mike Dirolf
 
MongoDB全機能解説1
Takahiro Inoue
 
Webinar: Schema Design
MongoDB
 
Ad

Similar to Indexing with MongoDB (20)

PPTX
MongoDB's index and query optimize
mysqlops
 
PDF
Indexing and Query Optimizer (Mongo Austin)
MongoDB
 
PPT
Indexing & query optimization
Jared Rosoff
 
PPTX
MongoDB (Advanced)
TO THE NEW | Technology
 
PDF
Mongoseattle indexing-2010-07-27
MongoDB
 
PDF
Indexing and Query Optimizer
MongoDB
 
PPTX
Indexing documents
MongoDB
 
PDF
Mongo indexes
Mehmet Çetin
 
PDF
Superficial mongo db
DaeMyung Kang
 
ODP
A Year With MongoDB: The Tips
Rizky Abdilah
 
PDF
The Query Engine: The Life of a Read
MongoDB
 
PDF
Mongophilly indexing-2011-04-26
kreuter
 
PPTX
MongoDB Auto-Sharding at Mongo Seattle
MongoDB
 
PPTX
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
PPTX
Indexing In MongoDB
Kishor Parkhe
 
PPTX
About elasticsearch
Minsoo Jun
 
PPTX
Indexing and Query Optimisation
MongoDB
 
PDF
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Citus Data
 
PDF
Performance Optimization MongoDB: Compound Indexes
techprane
 
PDF
MongoDB With Style
Gabriele Lana
 
MongoDB's index and query optimize
mysqlops
 
Indexing and Query Optimizer (Mongo Austin)
MongoDB
 
Indexing & query optimization
Jared Rosoff
 
MongoDB (Advanced)
TO THE NEW | Technology
 
Mongoseattle indexing-2010-07-27
MongoDB
 
Indexing and Query Optimizer
MongoDB
 
Indexing documents
MongoDB
 
Mongo indexes
Mehmet Çetin
 
Superficial mongo db
DaeMyung Kang
 
A Year With MongoDB: The Tips
Rizky Abdilah
 
The Query Engine: The Life of a Read
MongoDB
 
Mongophilly indexing-2011-04-26
kreuter
 
MongoDB Auto-Sharding at Mongo Seattle
MongoDB
 
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB
 
Indexing In MongoDB
Kishor Parkhe
 
About elasticsearch
Minsoo Jun
 
Indexing and Query Optimisation
MongoDB
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Citus Data
 
Performance Optimization MongoDB: Compound Indexes
techprane
 
MongoDB With Style
Gabriele Lana
 
Ad

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB
 

Indexing with MongoDB

  • 2. What are indexes?References to your documents, efficiently ordered by keyMaintained in a tree structure, allowing fast lookup{x:1}{y:1}{x:0.5,y:0.5}{x:2,y:0.5}{x:5,y:2}{x:-4,y:10}{x:3,y:’f’}
  • 3. Fast document lookup db.c.findOne( {_id:2} ), using index {_id:1}db.c.find( {x:2} ), using index {x:1}db.c.find( {x:{$in:[2,3]}} ), using index {x:1}db.c.find( {‘x.a’:1} ), using index {‘x.a’:1}Matches {_id:1,x:{a:1}}db.c.find( {x:{a:1}} ), using index {x:1}Matches {_id:1,x:{a:1}}, but not {_id:2,x:{a:1,b:2}}QUESTION: What about db.c.find( {$where:“this.x == this.y”} ), using index {x:1}?Indexes cannot be used for $where type queries, but if there are non-where elements in the query then indexes can be used for the non-where elements.
  • 4. Fast document range scandb.c.find( {x:{$gt:2}} ), using index {x:1}db.c.find( {x:{$gt:2,$lt:5}} ), using index {x:1}db.c.find( {x:/^a/} ), using index {x:1}QUESTION: What about db.c.find( {x:/a/} ), using index {x:1}?The letter ‘a’ can appear anywhere in a matching string, so lexicographic ordering on strings won’t help. However, we can use the index to find the range of documents where x is string (eg not a number) or x is the regular expression /a/.
  • 5. Other operationsdb.c.count( {x:2} ) using index {x:1}db.c.distinct( {x:2} ) using index {x:1}db.c.update( {x:2}, {x:3} ) using index {x:1}db.c.remove( {x:2} ) using index {x:1}QUESTION: What about db.c.update( {x:2}, {$inc:{x:3}} ), using index {x:1}?Older versions of mongoDB didn’t support modifiers on indexed fields, but we now support this.
  • 6. Fast document orderingdb.c.find( {} ).sort( {x:1} ), using index {x:1}db.c.find( {} ).sort( {x:-1} ), using index {x:1}db.c.find( {x:{$gt:4}} ).sort( {x:-1} ), using index {x:1}db.c.find( {} ).sort( {‘x.a’:1} ), using index {‘x.a’:1}QUESTION: What about db.c.find( {y:1} ).sort( {x:1} ), using index {x:1}?The index will be used to ensure ordering, provided there is no better index.
  • 7. Missing fieldsdb.c.find( {x:null} ), using index {x:1}Matches {_id:5}db.c.find( {x:{$exists:false}} ), using index {x:1}Matches {_id:5}, but not {_id:6,x:null}QUESTION: What about db.c.find( {x:{$exists:true}} ), using index {x:1}?The index is not currently used, though we may use the index in a future version of mongoDB.
  • 8. Array matchingAll the following match {_id:6,x:[2,10]} and use index {x:1}db.c.find( {x:2} )db.c.find( {x:10} )db.c.find( {x:{$gt:5}} )db.c.find( {x:[2,10]} )db.c.find( {x:{$in:[2,5]}} )QUESTION: What about db.c.find( {x:{$all:[2,10]}} )?The index will be used to look up all documents matching {x:2}.
  • 9. Compound Indexesdb.c.find( {x:10,y:20} ), using index {x:1,y:1}db.c.find( {x:10,y:20} ), using index {x:1,y:-1}db.c.find( {x:{$in:[10,20]},y:20} ), using index {x:1,y:1}db.c.find().sort( {x:1,y:1} ), using index {x:1,y:1}db.c.find().sort( {x:-1,y:1} ), using index {x:1,y:-1}db.c.find( {x:10} ).sort( {y:1} ), using index {x:1,y:1}QUESTION: What about db.c.find( {y:10} ).sort( {x:1} ), using index {x:1,y:1}?The index will be used to ensure ordering, provided no better index is available.
  • 10. When indexes are less helpfuldb.c.find( {x:{$ne:1}} )db.c.find( {x:{$mod:[10,1]}} )Uses index {x:1} to scan numbers onlydb.c.find( {x:{$not:/a/}} )db.c.find( {x:{$gte:0,$lte:10},y:5} ) using index {x:1,y:1}Currently must scan all elements from {x:0,y:5} to {x:10,y:5}, but some improvements may be possibledb.c.find( {$where:’this.x = 5’} )QUESTION: What about db.c.find( {x:{$not:/^a/}} ), using index {x:1}?The index is not used currently, but will be used in mongoDB 1.6
  • 11. Geospatial indexesdb.c.find( {a:[50,50]} ) using index {a:’2d’}db.c.find( {a:{$near:[50,50]}} ) using index {a:’2d’}Results are sorted closest - farthestdb.c.find( {a:{$within:{$box:[[40,40],[60,60]]}}} ) using index {a:’2d’}db.c.find( {a:{$within:{$center:[[50,50],10]}}} ) using index {a:’2d’}db.c.find( {a:{$near:[50,50]},b:2} ) using index {a:’2d’,b:1}QUESTION: Most queries can be performed with or without an index. Is this true of geospatial queries?No. A geospatial query requires an index.
  • 12. Creating indexes{_id:1} index created automaticallyFor non-capped collectionsdb.c.ensureIndex( {x:1} )Can create an index at any time, even when you already have plenty of data in your collectionCreating an index will block mongoDB unless you specify background index creationdb.c.ensureIndex( {x:1}, {background:true} )Background index creation is a still impacts performance – run at non peak times if you’re concernedQUESTION: Can an index be removed during background creation?Not at this time.
  • 13. Unique key constraintsdb.c.ensureIndex( {x:1}, {unique:true} )Don’t allow {_id:10,x:2} and {_id:11,x:2}Don’t allow {_id:12} and {_id:13} (both match {x:null}What if duplicates exist before index is created?Normally index creation fails and the index is removeddb.ensureIndex( {x:1}, {unique:true,dropDups:true} )QUESTION: In dropDups mode, which duplicates will be removed?The first document according to the collection’s “natural order” will be preserved.
  • 14. Cleaning up indexesdb.system.indexes.find( {ns:’db.c’} )db.c.dropIndex( {x:1} )db.c.dropIndexes()db.c.reIndex()Rebuilds all indexes, removing index cruft that has built up over large numbers of updates and deletes. Index cruft will not exist in mongoDB 1.6, so this command will be deprecated.QUESTION: Why would you want to drop an index?See next slide…
  • 15. Limits and TradeoffsMax 40 indexes per collectionLogically equivalent indexes are not prevented (eg {x:1} and {x:-1})Indexes can improve speed of queries, but make inserts slowerMore specific indexes {a:1,b:1,c:1} can be more helpful than less specific indexes {a:1}, but sorting compound keys may not be as fast as sorting simple keysQUESTION: Do indexes make updates slower? How about deletes?It depends – finding your document might be faster, but if any indexed fields are changed the indexes must be updated.
  • 16. Query OptimizerIn charge of picking which index to use for a query/count/update/delete/etcImplementation is part of the magic of mongo (you can read about it online – not covering today)Usually it does a good job, but if you know what you’re doing you can override itdb.c.find( {x:2,y:3} ).hint( {y:1} )Use index {y:1} and avoid trying out {x:1}As your data changes, different indexes may be chosen. Ordering requirements should be made explicit using sort().QUESTION: How can you force a full collection scan instead of using indexes?db.c.find( {x:2,y:3} ).hint( {$natural:1} )
  • 17. Mongod log outputquery test.c ntoreturn:1 reslen:69 nscanned:100000 { i: 99999.0 } nreturned:1 157msquery test.$cmd ntoreturn:1 command: { count: "c", query: { type: 0.0, i: { $gt: 99000.0 } }, fields: {} } reslen:64 256msquery:{ query: {}, orderby: { i: 1.0 } } ... query test.c ntoreturn:0 exception 1378ms ... User Exception 10128:too much key data for sort() with no index. add an index or specify a smaller limitquery test.c ntoreturn:0 reslen:4783 nscanned:100501 { query: { type: 500.0 }, orderby: { i: 1.0 } } nreturned:101 390msOccasionally may see a slow operation as a result of disk activity or mongo cleaning things up – some messages about slow ops are spuriousKeep this in mind when running the same op a massive number of times, and it appears slow very rarely
  • 18. ProfilingRecord same info as with log messages, but in a database collection> db.system.profile.find(){"ts" : "Thu Jan 29 2009 15:19:32 GMT-0500 (EST)" , "info" : "query test.$cmd ntoreturn:1 reslen:66 nscanned:0 <br>query: { profile: 2 } nreturned:1 bytes:50" , "millis" : 0}...> db.system.profile.find( { info: /test.foo/ } )> db.system.profile.find( { millis : { $gt : 5 } } )> db.system.profile.find().sort({$natural:-1})Enable explicitly using levels (0:off, 1:slow ops (>100ms), 2:all ops)> db.setProfilingLevel(2);{"was" : 0 , "ok" : 1}> db.getProfilingLevel()2> db.setProfilingLevel( 1 , 10 ); // slow means > 10msProfiling impacts performance, but not severely
  • 19. Query explain> db.c.find( {x:1000,y:0} ).explain(){ "cursor" : "BtreeCursor x_1", "indexBounds" : [ [ { "x" : 1000 }, { "x" : 1000 } ] ], "nscanned" : 10, "nscannedObjects" : 10, "n" : 10, "millis" : 0, "oldPlan" : { "cursor" : "BtreeCursor x_1", "indexBounds" : [ [ { "x" : 1000 }, { "x" : 1000 } ] ] }, "allPlans" : [ { "cursor" : "BtreeCursor x_1", "indexBounds" : [ [ { "x" : 1000 }, { "x" : 1000 } ] ] }, { "cursor" : "BtreeCursor y_1", "indexBounds" : [ [ { "y" : 0 }, { "y" : 0 } ] ] }, { "cursor" : "BasicCursor", "indexBounds" : [ ] } ]}
  • 20. Example 1> db.c.findOne( {i:99999} ){ "_id" : ObjectId("4bb962dddfdcf5761c1ec6a3"), "i" : 99999 }query test.c ntoreturn:1 reslen:69 nscanned:100000 { i: 99999.0 } nreturned:1 157ms> db.c.find( {i:99999} ).limit(1).explain(){ "cursor" : "BasicCursor", "indexBounds" : [ ], "nscanned" : 100000, "nscannedObjects" : 100000, "n" : 1, "millis" : 161, "allPlans" : [ { "cursor" : "BasicCursor", "indexBounds" : [ ] } ]}> db.c.ensureIndex( {i:1} );> for( i = 0; i < 100000; ++i ) { db.c.save( {i:i} ); }
  • 21. Example 2> db.c.count( {type:0,i:{$gt:99000}} )499query test.$cmd ntoreturn:1 command: { count: "c", query: { type: 0.0, i: { $gt: 99000.0 } }, fields: {} } reslen:64 256ms> db.c.find( {type:0,i:{$gt:99000}} ).limit(1).explain(){ "cursor" : "BtreeCursor type_1", "indexBounds" : [ [ { "type" : 0 }, { "type" : 0 } ] ], "nscanned" : 49502, "nscannedObjects" : 49502, "n" : 1, "millis" : 349,...> db.c.ensureIndex( {type:1,i:1} );> for( i = 0; i < 100000; ++i ) { db.c.save( {type:i%2,i:i} ); }
  • 22. Example 3> db.c.find().sort( {i:1} )error: { "$err" : "too much key data for sort() with no index. add an index or specify a smaller limit"}> db.c.find().sort( {i:1} ).explain()JS Error: uncaught exception: error: { "$err" : "too much key data for sort() with no index. add an index or specify a smaller limit"}> db.c.ensureIndex( {i:1} );> for( i = 0; i < 1000000; ++i ) { db.c.save( {i:i} ); }
  • 23. Example 4> db.c.find( {type:500} ).sort( {i:1} ){ "_id" : ObjectId("4bba4904dfdcf5761c2f917e"), "i" : 500, "type" : 500 }{ "_id" : ObjectId("4bba4904dfdcf5761c2f9566"), "i" : 1500, "type" : 500 }...query test.c ntoreturn:0 reslen:4783 nscanned:100501 { query: { type: 500.0 }, orderby: { i: 1.0 } } nreturned:101 390ms> db.c.find( {type:500} ).sort( {i:1} ).explain(){ "cursor" : "BtreeCursor i_1", "indexBounds" : [ [ { "i" : { "$minElement" : 1 } }, { "i" : { "$maxElement" : 1 } } ] ], "nscanned" : 1000000, "nscannedObjects" : 1000000, "n" : 1000, "millis" : 5388,...> db.c.ensureIndex( {type:1,i:1} );> for( i = 0; i < 1000000; ++i ) { db.c.save( {i:i,type:i%1000} ); }
  • 24. Questions?Follow @mongodbGet involved www.mongodb.orgUpcoming events www.mongodb.org/display/DOCS/EventsMongoSF April 30SF office hours every Mon 4-6pm Epicenter CafeCommercial support [email protected]