MongoDB® has great support for arrays and affords a lot of flexibility in your document mode. For example, you can embed arrays in documents and also embed documents in arrays, etc. However, working with arrays has a number of gotcha’s in MongoDB®. In this post, we’ll look at some of the issues MongoDB® has with indexes and arrays.
Multikey Indexes
In MongoDB®, you can index an array field to create an index entry for each element in the array. The resulting index is called a ‘multikey’ index. Multikey indexes can be created over scalar values or embedded documents. For more information on multikey indexes, refer to this documentation.
Multikey indexes, although useful, have several limitations:
- If you create a compound multikey index, then you have atmost one field that is an array.
- A compound index cannot be a shard key.
- A compound index cannot be a hashed index.
[clickToTweet tweet=”MongoDB Multikey Index Limitations – What to look out for in your queries” quote=”MongoDB Multikey Index Limitations – What to look out for in your queries”]
One of the most interesting aspects of multikey indexes is how index intersection bounds are calculated.
Index Intersect Bounds
Here is how the MongoDB documentation defines index intersect bounds:
“The bounds of an index scan define the portions of an index to search during a query. When multiple predicates over an index exist, MongoDB will attempt to combine the bounds for these predicates by either intersection or compounding in order to produce a scan with smaller bounds.”
Range Queries on Arrays
Lets get started with a simple example to see how MongoDB computes index bounds for queries on arrays. Assume we have the following three documents in a collection:
{x: 65} {x: 35} {x: [12,95]}
We issue the following query:
db.coll.find({x :{ $gt :22, $lt:55})
The query is simple enough. You would expect the answer to be {x:35} but the query returns:
{x:35} {x:[25,95]}
The reason comes from how MongoDB deals with arrays. The same element of the array does not need to match both conditions; as long as there’s one element matching each condition, its a match. So, in this case, the bounds are [22, Infinity] and [-Infinity, 55]. Since an ‘elemMatch’ operator is not used, MongoDB does not use the index intersection. MongoDB does not specify which of these ranges [22, Infinity] or [-Infinity, 55] will be used for the execution of the query.
If we want to use the index intersection, then we need to use the follow query:
db.coll.find(x :{ $elemMatch:{$gt :22,$lt:55}})
When you use this, MongoDB intersects the index bounds and uses [22, 55] as the bounds. As expected, this query does not return any results (elemMatch does not match non arrays). So, essentially, range queries on arrays are fairly useless without the $elemMatch operator.
Compound Multikey Indexes – Mixing Array and Non-Array Fields
Consider a collection with the following documents:
{item: 35, prices:[250,35]} ...... {item: 106, prices:[1500,65]}
We’ll add a compound index on this collection:
db.ensureIndex({item:1, prices:1});
Now lets run a simple query:
db. coll. find({item: {$gt:12, $lt:65}});
The query looks simple enough, as we’re using a non-array item with a fixed range. I expect the Index intersect bounds to be something like item:[[12,65] ] for the query, however, if you run an explain you will see this:
"indexBounds" : { "item" : [ [ -Infinity, 65 ] ], "prices" : [ [ { "$minElement" : 1 }, { "$maxElement" : 1 } ] ] },
The reason is that MongoDB detects that this is a multikey index, and does not process the index bounds intersection, regardless of the fact that your query is not using any array fields. The moral of the story is that when you mix array and non-array fields in an index, always keep an eye on your index intersection bounds. Odds are that it is not efficient.