MongoDB® Regex, Index & Performance

2 min read
MongoDB® Regex, Index & Performance

SHARE THIS ARTICLE

MongoDB® supports regular expressions using the $regex operator. However these MongoDB regex queries have a downside,  all but one type of regex makes poor use of indexes and results in performance problems. For a production server with large amounts of data, a bad regex query can bring your server to its knees.

MongoDB regex based queries are a fairly common query in most applications using MongoDB. This is similar to the ‘LIKE’ operation supported on most relational databases. The syntax of the command is as follows

{ $regex: /pattern/, $options: '<options>' }
E.g. { name: { $regex: /^acme.*test/}}

For more detailed information on the regex operation and additional options refer to the MongoDB documentation

For the rest of this discussion, we will assume that the field you are matching against has an index. If you don’t index it will result in a collection scan and very poor performance. However, even if the field is indexed it can result in poor performance. The reason is that MongoDB can make good use of indexes only if your regular expression is a “prefix expression” – these are expressions starting with the “^” character.

E.g.  { name: { $regex: /^acme/}}

This allows MongoDB to identify a range of the index entries that are relevant to this query and results in efficient queries. Any other query results in an index scan since MongoDB is not able to narrow the scan to a range of index entries. An index scan is particularly bad since all the indexes need to be paged into memory and this affects the working set of your server (In fact the index scan could lead to worse performance than a collection scan – it results in twice the number of page faults).

Let’s look at some examples and the resulting query plans. For our testing purposes I have setup a collection with 100k documents. Each document has a firstName field which is a 16 character string.

Example 1: { name: { $regex: /^acme/}}
Result : Efficient index usage
Query plan:

"executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 0,
    "executionTimeMillis" : 0,
    "totalKeysExamined" : 1,
    "totalDocsExamined" : 0,
}

Example 2: { name: { $regex: /^acme/i}}
Result : Inefficient index scan due to case insensitive requirement. So basically /i option negates the “prefix expression”
Query plan:

"executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 0,
    "executionTimeMillis" : 137,
    "totalKeysExamined" : 100000,
    "totalDocsExamined" : 0
}

Example 3: { name: { $regex: /acme.*corp/}}
Result : Inefficient index scan
Query plan:

    "executionSuccess" : true,
    "nReturned" : 0,
    "executionTimeMillis" : 167,
    "totalKeysExamined" : 100000,
    "totalDocsExamined" : 0

Example 4: { name: { $regex: /acme/}}
Result : Inefficient index scan

"executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 0,
    "executionTimeMillis" : 130,
    "totalKeysExamined" : 100000,
    "totalDocsExamined" : 0,
}
For more information, please visit www.scalegrid.io. Connect with ScaleGrid on LinkedIn, X, Facebook, and YouTube.
Table of Contents

Stay Ahead with ScaleGrid Insights

Dive into the world of database management with our monthly newsletter. Get expert tips, in-depth articles, and the latest news, directly to your inbox.

Related Posts

Redis vs Memcached in 2024

Choosing between Redis and Memcached hinges on specific application requirements. In this comparison of Redis vs Memcached, we strip away...

multi cloud plan - scalegrid

Plan Your Multi Cloud Strategy

Thinking about going multi-cloud? A well-planned multi cloud strategy can seriously upgrade your business’s tech game, making you more agile....

hybrid cloud strategy - scalegrid

Mastering Hybrid Cloud Strategy

Mastering Hybrid Cloud Strategy Are you looking to leverage the best private and public cloud worlds to propel your business...

NEWS

Add Headline Here