MongoDB® supports regular expressions using the $regex operator. However these MongoDB regex queries have a downside, all but one type of regex makes poor use of indexes and results in performance problems. For a production server with large amounts of data, a bad regex query can bring your server to its knees.
MongoDB regex based queries are a fairly common query in most applications using MongoDB. This is similar to the ‘LIKE’ operation supported on most relational databases. The syntax of the command is as follows
{ $regex: /pattern/, $options: '<options>' } E.g. { name: { $regex: /^acme.*test/}}
For more detailed information on the regex operation and additional options refer to the MongoDB documentation
For the rest of this discussion, we will assume that the field you are matching against has an index. If you don’t index it will result in a collection scan and very poor performance. However, even if the field is indexed it can result in poor performance. The reason is that MongoDB can make good use of indexes only if your regular expression is a “prefix expression” – these are expressions starting with the “^” character.
E.g. { name: { $regex: /^acme/}}
This allows MongoDB to identify a range of the index entries that are relevant to this query and results in efficient queries. Any other query results in an index scan since MongoDB is not able to narrow the scan to a range of index entries. An index scan is particularly bad since all the indexes need to be paged into memory and this affects the working set of your server (In fact the index scan could lead to worse performance than a collection scan – it results in twice the number of page faults).
Let’s look at some examples and the resulting query plans. For our testing purposes I have setup a collection with 100k documents. Each document has a firstName field which is a 16 character string.
Example 1: { name: { $regex: /^acme/}}
Result : Efficient index usage
Query plan:
"executionStats" : { "executionSuccess" : true, "nReturned" : 0, "executionTimeMillis" : 0, "totalKeysExamined" : 1, "totalDocsExamined" : 0, }
Example 2: { name: { $regex: /^acme/i}}
Result : Inefficient index scan due to case insensitive requirement. So basically /i option negates the “prefix expression”
Query plan:
"executionStats" : { "executionSuccess" : true, "nReturned" : 0, "executionTimeMillis" : 137, "totalKeysExamined" : 100000, "totalDocsExamined" : 0 }
Example 3: { name: { $regex: /acme.*corp/}}
Result : Inefficient index scan
Query plan:
"executionSuccess" : true, "nReturned" : 0, "executionTimeMillis" : 167, "totalKeysExamined" : 100000, "totalDocsExamined" : 0
Example 4: { name: { $regex: /acme/}}
Result : Inefficient index scan
"executionStats" : { "executionSuccess" : true, "nReturned" : 0, "executionTimeMillis" : 130, "totalKeysExamined" : 100000, "totalDocsExamined" : 0, }