#MongoDBDays




Indexing and Query
Optimization
Chad Tindel
Senior Solution Architect, 10gen
Agenda
• What are indexes?
• Why do I need them?
• Working with indexes in MongoDB
• Optimize your queries
• Avoiding common mistakes
What are indexes?
What are indexes?
Imagine you're looking for a recipe in a cookbook
ordered by recipe name. Looking up a recipe by
name is quick and easy.
What are indexes?
• How would you find a recipe using chicken?
• How about a 250-350 calorie recipe using
 chicken?
KRISTINE TO INSERT IMAGE OF COOKBOOK




Consult the index!
1   2   3    4    5   6   7




        Linked List
1    2    3     4    5     6   7




    Finding 7 in Linked List
4


    2                       6


1          3        5           7


        Finding 7 in Tree
Indexes in MongoDB are B-trees
Queries, inserts and deletes:
       O(log(n)) time
Indexes are the single
biggest tunable
performance factor in
MongoDB
Absent or suboptimal
indexes are the most
common avoidable
MongoDB performance
problem.
Why do I need indexes?
A brief story
Working with Indexes in
MongoDB
How do I create indexes?
// Create an index if one does not exist
db.recipes.createIndex({ main_ingredient: 1 })



// The client remembers the index and raises no errors
db.recipes.ensureIndex({ main_ingredient: 1 })




* 1 means ascending, -1 descending
What can be indexed?
// Multiple fields (compound key indexes)
db.recipes.ensureIndex({
   main_ingredient: 1,
   calories: -1
})

// Arrays of values (multikey indexes)
{
   name: 'Chicken Noodle Soup’,
   ingredients : ['chicken', 'noodles']
}

db.recipes.ensureIndex({ ingredients: 1 })
What can be indexed?
// Subdocuments
{
   name : 'Apple Pie',
   contributor: {
     name: 'Joe American',
     id: 'joea123'
   }
}

db.recipes.ensureIndex({ 'contributor.id': 1 })

db.recipes.ensureIndex({ 'contributor': 1 })
How do I manage indexes?
// List a collection's indexes
db.recipes.getIndexes()
db.recipes.getIndexKeys()


// Drop a specific index
db.recipes.dropIndex({ ingredients: 1 })


// Drop all indexes and recreate them
db.recipes.reIndex()


// Default (unique) index on _id
Background Index Builds
// Index creation is a blocking operation that can take a long time
// Background creation yields to other operations
db.recipes.ensureIndex(
    { ingredients: 1 },
    { background: true }
)
Options
• Uniqueness constraints (unique, dropDups)
• Sparse Indexes
• Geospatial (2d) Indexes
• TTL Collections (expireAfterSeconds)
Uniqueness Constraints
// Only one recipe can have a given value for name
db.recipes.ensureIndex( { name: 1 }, { unique: true } )


// Force index on collection with duplicate recipe names – drop the
duplicates
db.recipes.ensureIndex(
    { name: 1 },
    { unique: true, dropDups: true }
)


* dropDups is probably never what you want
Sparse Indexes
// Only documents with field calories will be indexed
db.recipes.ensureIndex(
    { calories: -1 },
    { sparse: true }
)
// Allow multiple documents to not have calories field
db.recipes.ensureIndex(
    { name: 1 , calories: -1 },
    { unique: true, sparse: true }
)
* Missing fields are stored as null(s) in the index
Geospatial Indexes
// Add latitude, longitude coordinates
{
     name: '10gen Palo Alto’,
     loc: [ 37.449157, -122.158574 ]
}
// Index the coordinates
db.locations.ensureIndex( { loc : '2d' } )


// Query for locations 'near' a particular coordinate
db.locations.find({
     loc: { $near: [ 37.4, -122.3 ] }
})
TTL Collections
// Documents must have a BSON UTC Date field
{ 'status' : ISODate('2012-10-12T05:24:07.211Z'), … }


// Documents are removed after 'expireAfterSeconds' seconds
db.recipes.ensureIndex(
    { submitted_date: 1 },
    { expireAfterSeconds: 3600 }
)
Limitations
• Collections can not have > 64 indexes.

• Index keys can not be > 1024 bytes (1K).

• The name of an index, including the namespace, must be <
  128 characters.
• Queries can only use 1 index*

• Indexes have storage requirements, and impact the
  performance of writes.
• In memory sort (no-index) limited to 32mb of return data.
Optimize Your Queries
Profiling Slow Ops
db.setProfilingLevel( n , slowms=100ms )


n=0 profiler off
n=1 record operations longer than slowms
n=2 record all queries


db.system.profile.find()




* The profile collection is a capped collection, and fixed in size
The Explain Plan (Pre Index)
db.recipes.find( { calories:
    { $lt : 40 } }
).explain( )
{
    "cursor" : "BasicCursor" ,
    "n" : 42,
    "nscannedObjects” : 12345
    "nscanned" : 12345,
    ...
    "millis" : 356,
    ...
}
* Doesn’t use cached plans, re-evals and resets cache
The Explain Plan (Post Index)
db.recipes.find( { calories:
    { $lt : 40 } }
).explain( )
{
    "cursor" : "BtreeCursor calories_-1" ,
    "n" : 42,
    "nscannedObjects": 42
    "nscanned" : 42,
    ...
    "millis" : 0,
    ...
}
* Doesn’t use cached plans, re-evals and resets cache
The Query Optimizer
• For each "type" of query, MongoDB
  periodically tries all useful indexes
• Aborts the rest as soon as one plan wins
• The winning plan is temporarily cached for
  each “type” of query
Manually Select Index to Use
// Tell the database what index to use
db.recipes.find({
  calories: { $lt: 1000 } }
).hint({ _id: 1 })


// Tell the database to NOT use an index
db.recipes.find(
  { calories: { $lt: 1000 } }
).hint({ $natural: 1 })
Use Indexes to Sort Query
Results
// Given the following index
db.collection.ensureIndex({ a:1, b:1 , c:1, d:1 })

// The following query and sort operations can use the index
db.collection.find( ).sort({ a:1 })
db.collection.find( ).sort({ a:1, b:1 })

db.collection.find({ a:4 }).sort({ a:1, b:1 })
db.collection.find({ b:5 }).sort({ a:1, b:1 })
Indexes that won’t work for
sorting query results
// Given the following index
db.collection.ensureIndex({ a:1, b:1, c:1, d:1 })


// These can not sort using the index
db.collection.find( ).sort({ b: 1 })
db.collection.find({ b: 5 }).sort({ b: 1 })
Index Covered Queries
// MongoDB can return data from just the index
db.recipes.ensureIndex({ main_ingredient: 1, name: 1 })

// Return only the ingredients field
db.recipes.find(
   { main_ingredient: 'chicken’ },
   { _id: 0, name: 1 }
)

// indexOnly will be true in the explain plan
db.recipes.find(
    { main_ingredient: 'chicken' },
    { _id: 0, name: 1 }
).explain()
{
    "indexOnly": true,
}
Absent or suboptimal
indexes are the most
common avoidable
MongoDB performance
problem.
Avoiding Common
Mistakes
Trying to Use Multiple
Indexes
// MongoDB can only use one index for a query
db.collection.ensureIndex({ a: 1 })
db.collection.ensureIndex({ b: 1 })


// Only one of the above indexes is used
db.collection.find({ a: 3, b: 4 })
Compound Key Mistakes
// Compound key indexes are very effective
db.collection.ensureIndex({ a: 1, b: 1, c: 1 })


// But only if the query is a prefix of the index


// This query can't effectively use the index
db.collection.find({ c: 2 })


// …but this query can
db.collection.find({ a: 3, b: 5 })
Low Selectivity Indexes
db.collection.distinct('status’)
[ 'new', 'processed' ]


db.collection.ensureIndex({ status: 1 })


// Low selectivity indexes provide little benefit
db.collection.find({ status: 'new' })


// Better
db.collection.ensureIndex({ status: 1, created_at: -1 })
db.collection.find(
  { status: 'new' }
).sort({ created_at: -1 })
Regular Expressions
db.users.ensureIndex({ username: 1 })


// Left anchored regex queries can use the index
db.users.find({ username: /^joe smith/ })


// But not generic regexes
db.users.find({username: /smith/ })


// Or case insensitive queries
db.users.find({ username: /Joe/i })
Negation
// Indexes aren't helpful with negations
db.things.ensureIndex({ x: 1 })

// e.g. "not equal" queries
db.things.find({ x: { $ne: 3 } })

// …or "not in" queries
db.things.find({ x: { $nin: [2, 3, 4 ] } })

// …or the $not operator
db.people.find({ name: { $not: 'John Doe' } })
Choosing the right
indexes is one of the
most important things
you can do as a
MongoDB developer so
take the time to get your
indexes right!
#MongoDBDays




Thank you
Chad Tindel
Senior Solution Architect, 10gen

Indexing & Query Optimization

  • 1.
    #MongoDBDays Indexing and Query Optimization ChadTindel Senior Solution Architect, 10gen
  • 2.
    Agenda • What areindexes? • Why do I need them? • Working with indexes in MongoDB • Optimize your queries • Avoiding common mistakes
  • 3.
  • 4.
    What are indexes? Imagineyou're looking for a recipe in a cookbook ordered by recipe name. Looking up a recipe by name is quick and easy.
  • 5.
    What are indexes? •How would you find a recipe using chicken? • How about a 250-350 calorie recipe using chicken?
  • 6.
    KRISTINE TO INSERTIMAGE OF COOKBOOK Consult the index!
  • 7.
    1 2 3 4 5 6 7 Linked List
  • 8.
    1 2 3 4 5 6 7 Finding 7 in Linked List
  • 9.
    4 2 6 1 3 5 7 Finding 7 in Tree
  • 10.
    Indexes in MongoDBare B-trees
  • 11.
    Queries, inserts anddeletes: O(log(n)) time
  • 12.
    Indexes are thesingle biggest tunable performance factor in MongoDB
  • 13.
    Absent or suboptimal indexesare the most common avoidable MongoDB performance problem.
  • 14.
    Why do Ineed indexes? A brief story
  • 15.
  • 16.
    How do Icreate indexes? // Create an index if one does not exist db.recipes.createIndex({ main_ingredient: 1 }) // The client remembers the index and raises no errors db.recipes.ensureIndex({ main_ingredient: 1 }) * 1 means ascending, -1 descending
  • 17.
    What can beindexed? // Multiple fields (compound key indexes) db.recipes.ensureIndex({ main_ingredient: 1, calories: -1 }) // Arrays of values (multikey indexes) { name: 'Chicken Noodle Soup’, ingredients : ['chicken', 'noodles'] } db.recipes.ensureIndex({ ingredients: 1 })
  • 18.
    What can beindexed? // Subdocuments { name : 'Apple Pie', contributor: { name: 'Joe American', id: 'joea123' } } db.recipes.ensureIndex({ 'contributor.id': 1 }) db.recipes.ensureIndex({ 'contributor': 1 })
  • 19.
    How do Imanage indexes? // List a collection's indexes db.recipes.getIndexes() db.recipes.getIndexKeys() // Drop a specific index db.recipes.dropIndex({ ingredients: 1 }) // Drop all indexes and recreate them db.recipes.reIndex() // Default (unique) index on _id
  • 20.
    Background Index Builds //Index creation is a blocking operation that can take a long time // Background creation yields to other operations db.recipes.ensureIndex( { ingredients: 1 }, { background: true } )
  • 21.
    Options • Uniqueness constraints(unique, dropDups) • Sparse Indexes • Geospatial (2d) Indexes • TTL Collections (expireAfterSeconds)
  • 22.
    Uniqueness Constraints // Onlyone recipe can have a given value for name db.recipes.ensureIndex( { name: 1 }, { unique: true } ) // Force index on collection with duplicate recipe names – drop the duplicates db.recipes.ensureIndex( { name: 1 }, { unique: true, dropDups: true } ) * dropDups is probably never what you want
  • 23.
    Sparse Indexes // Onlydocuments with field calories will be indexed db.recipes.ensureIndex( { calories: -1 }, { sparse: true } ) // Allow multiple documents to not have calories field db.recipes.ensureIndex( { name: 1 , calories: -1 }, { unique: true, sparse: true } ) * Missing fields are stored as null(s) in the index
  • 24.
    Geospatial Indexes // Addlatitude, longitude coordinates { name: '10gen Palo Alto’, loc: [ 37.449157, -122.158574 ] } // Index the coordinates db.locations.ensureIndex( { loc : '2d' } ) // Query for locations 'near' a particular coordinate db.locations.find({ loc: { $near: [ 37.4, -122.3 ] } })
  • 25.
    TTL Collections // Documentsmust have a BSON UTC Date field { 'status' : ISODate('2012-10-12T05:24:07.211Z'), … } // Documents are removed after 'expireAfterSeconds' seconds db.recipes.ensureIndex( { submitted_date: 1 }, { expireAfterSeconds: 3600 } )
  • 26.
    Limitations • Collections cannot have > 64 indexes. • Index keys can not be > 1024 bytes (1K). • The name of an index, including the namespace, must be < 128 characters. • Queries can only use 1 index* • Indexes have storage requirements, and impact the performance of writes. • In memory sort (no-index) limited to 32mb of return data.
  • 27.
  • 28.
    Profiling Slow Ops db.setProfilingLevel(n , slowms=100ms ) n=0 profiler off n=1 record operations longer than slowms n=2 record all queries db.system.profile.find() * The profile collection is a capped collection, and fixed in size
  • 29.
    The Explain Plan(Pre Index) db.recipes.find( { calories: { $lt : 40 } } ).explain( ) { "cursor" : "BasicCursor" , "n" : 42, "nscannedObjects” : 12345 "nscanned" : 12345, ... "millis" : 356, ... } * Doesn’t use cached plans, re-evals and resets cache
  • 30.
    The Explain Plan(Post Index) db.recipes.find( { calories: { $lt : 40 } } ).explain( ) { "cursor" : "BtreeCursor calories_-1" , "n" : 42, "nscannedObjects": 42 "nscanned" : 42, ... "millis" : 0, ... } * Doesn’t use cached plans, re-evals and resets cache
  • 31.
    The Query Optimizer •For each "type" of query, MongoDB periodically tries all useful indexes • Aborts the rest as soon as one plan wins • The winning plan is temporarily cached for each “type” of query
  • 32.
    Manually Select Indexto Use // Tell the database what index to use db.recipes.find({ calories: { $lt: 1000 } } ).hint({ _id: 1 }) // Tell the database to NOT use an index db.recipes.find( { calories: { $lt: 1000 } } ).hint({ $natural: 1 })
  • 33.
    Use Indexes toSort Query Results // Given the following index db.collection.ensureIndex({ a:1, b:1 , c:1, d:1 }) // The following query and sort operations can use the index db.collection.find( ).sort({ a:1 }) db.collection.find( ).sort({ a:1, b:1 }) db.collection.find({ a:4 }).sort({ a:1, b:1 }) db.collection.find({ b:5 }).sort({ a:1, b:1 })
  • 34.
    Indexes that won’twork for sorting query results // Given the following index db.collection.ensureIndex({ a:1, b:1, c:1, d:1 }) // These can not sort using the index db.collection.find( ).sort({ b: 1 }) db.collection.find({ b: 5 }).sort({ b: 1 })
  • 35.
    Index Covered Queries //MongoDB can return data from just the index db.recipes.ensureIndex({ main_ingredient: 1, name: 1 }) // Return only the ingredients field db.recipes.find( { main_ingredient: 'chicken’ }, { _id: 0, name: 1 } ) // indexOnly will be true in the explain plan db.recipes.find( { main_ingredient: 'chicken' }, { _id: 0, name: 1 } ).explain() { "indexOnly": true, }
  • 36.
    Absent or suboptimal indexesare the most common avoidable MongoDB performance problem.
  • 37.
  • 38.
    Trying to UseMultiple Indexes // MongoDB can only use one index for a query db.collection.ensureIndex({ a: 1 }) db.collection.ensureIndex({ b: 1 }) // Only one of the above indexes is used db.collection.find({ a: 3, b: 4 })
  • 39.
    Compound Key Mistakes //Compound key indexes are very effective db.collection.ensureIndex({ a: 1, b: 1, c: 1 }) // But only if the query is a prefix of the index // This query can't effectively use the index db.collection.find({ c: 2 }) // …but this query can db.collection.find({ a: 3, b: 5 })
  • 40.
    Low Selectivity Indexes db.collection.distinct('status’) ['new', 'processed' ] db.collection.ensureIndex({ status: 1 }) // Low selectivity indexes provide little benefit db.collection.find({ status: 'new' }) // Better db.collection.ensureIndex({ status: 1, created_at: -1 }) db.collection.find( { status: 'new' } ).sort({ created_at: -1 })
  • 41.
    Regular Expressions db.users.ensureIndex({ username:1 }) // Left anchored regex queries can use the index db.users.find({ username: /^joe smith/ }) // But not generic regexes db.users.find({username: /smith/ }) // Or case insensitive queries db.users.find({ username: /Joe/i })
  • 42.
    Negation // Indexes aren'thelpful with negations db.things.ensureIndex({ x: 1 }) // e.g. "not equal" queries db.things.find({ x: { $ne: 3 } }) // …or "not in" queries db.things.find({ x: { $nin: [2, 3, 4 ] } }) // …or the $not operator db.people.find({ name: { $not: 'John Doe' } })
  • 43.
    Choosing the right indexesis one of the most important things you can do as a MongoDB developer so take the time to get your indexes right!
  • 44.

Editor's Notes

  • #4 When speaking: What are indexes and why do we need them?First part of this talk is conceptualSecond part is extremely detailed
  • #10 Look at 7 documents
  • #11 Queries, inserts and deletes: O(log(n)) time
  • #12 MongoDB&apos;s indexes are B-Trees.Lookups (queries), inserts and deletes happen in O(log(n)) time.TODO: Add a page describing what a B-Tree is???
  • #13 So this is helpful, and can speed up queries by a tremendous amount
  • #14 So it’s imperative we understand them
  • #16 Tell a story about a customer problem caused by a missing index.
  • #18 Repeated calls to ensureIndex only result in one create message going to the server. The index is cached client side for some period of time (varies by driver).
  • #20 Indexes can be costly if you have too manysoooo....
  • #21 getIndexes returns an index document for each index in the collection.dropIndex requires the spec used to create the index initiallyreIndex drops *all* indexes (including the _id index) and rebuilds them
  • #22 Caveats:Still a resource-intensive operationIndex build is slowerThe mongo shell session or app will block while the index buildsIndexes are still built in the foreground on secondariesKristine to provide replica set image.
  • #24 unique applies a uniqueness constant on duplicate values.dropDups will force the server to create a unique index by only keeping the first document found in natural order with a value and dropping all other documents with that value.dropDups will likely result in data loss!!!TODO: Maybe add a red exclamation point for dropDups.
  • #25 MongoDB doesn&apos;t enforce a schema – documents are not required to have the same fields.Sparse indexes only contain entries for documents that have the indexed field.Without sparse, documents without field &apos;a&apos; have a null entry in the index for that field.With sparse a unique constraint can be applied to a field not shared by all documents. Otherwise multiple &apos;null&apos; values violate the unique constraint.XXX: Is there a visual that makes sense here?
  • #26 &apos;2d&apos; index is a geohash on top of the b-tree.Allows you to search for documents &apos;near&apos; a latitude/longitude position. Bounds queries are also possible using $within.TODO: Google maps image, or something similar. Kristine to provide.
  • #27 Index must be on a BSON date field.Documents are removed after expireAfterSeconds seconds.Reaper thread runs every 60 seconds.TODO: Hourglass image, or something similar. Kristine to provide.
  • #28 Indexes are a really powerful feature of MongoDB, however there are some limitations.Understanding these limitations is an important part of using MongoDB correctly.With the exception of $or queries.If index key exceeds 1k, documents silently dropped/not included
  • #30 Changingslowms also affects what queries are logged to the mongodb log file.
  • #31 cursor – the type of cursor used. BasicCursor means no index was used. TODO: Use a real example here instead of made up numbers…n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
  • #32 cursor – the type of cursor used. BasicCursor means no index was used.n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
  • #33 Winning plan is reevaluated after 1000 write operations (insert, update, remove, etc.).TODO: Replace much of this with an animation? Kristine to provide.
  • #34 Tells MongoDB exactly what index to use.
  • #35 MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.TODO: Better explanation
  • #36 MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.TODO: Better explanation
  • #37 TODO: Cookbook image here? Rework to go along with the cookbook example?
  • #39 Tell a story about a customer problem caused by a suboptimal index.TODO: Change background color?
  • #42 Better to use a compound index on the low selectivity field and some other more selective field.