MongoDB® Regex, Index & Performance

2 min read
MongoDB® Regex, Index & Performance

SHARE THIS ARTICLE

MongoDB® supports regular expressions using the $regex operator. However these MongoDB regex queries have a downside,  all but one type of regex makes poor use of indexes and results in performance problems. For a production server with large amounts of data, a bad regex query can bring your server to its knees.

MongoDB regex based queries are a fairly common query in most applications using MongoDB. This is similar to the ‘LIKE’ operation supported on most relational databases. The syntax of the command is as follows

{ $regex: /pattern/, $options: '<options>' }
E.g. { name: { $regex: /^acme.*test/}}

For more detailed information on the regex operation and additional options refer to the MongoDB documentation

For the rest of this discussion, we will assume that the field you are matching against has an index. If you don’t index it will result in a collection scan and very poor performance. However, even if the field is indexed it can result in poor performance. The reason is that MongoDB can make good use of indexes only if your regular expression is a “prefix expression” – these are expressions starting with the “^” character.

E.g.  { name: { $regex: /^acme/}}

This allows MongoDB to identify a range of the index entries that are relevant to this query and results in efficient queries. Any other query results in an index scan since MongoDB is not able to narrow the scan to a range of index entries. An index scan is particularly bad since all the indexes need to be paged into memory and this affects the working set of your server (In fact the index scan could lead to worse performance than a collection scan – it results in twice the number of page faults).

Let’s look at some examples and the resulting query plans. For our testing purposes I have setup a collection with 100k documents. Each document has a firstName field which is a 16 character string.

Example 1: { name: { $regex: /^acme/}}
Result : Efficient index usage
Query plan:

"executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 0,
    "executionTimeMillis" : 0,
    "totalKeysExamined" : 1,
    "totalDocsExamined" : 0,
}

Example 2: { name: { $regex: /^acme/i}}
Result : Inefficient index scan due to case insensitive requirement. So basically /i option negates the “prefix expression”
Query plan:

"executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 0,
    "executionTimeMillis" : 137,
    "totalKeysExamined" : 100000,
    "totalDocsExamined" : 0
}

Example 3: { name: { $regex: /acme.*corp/}}
Result : Inefficient index scan
Query plan:

    "executionSuccess" : true,
    "nReturned" : 0,
    "executionTimeMillis" : 167,
    "totalKeysExamined" : 100000,
    "totalDocsExamined" : 0

Example 4: { name: { $regex: /acme/}}
Result : Inefficient index scan

"executionStats" : {
    "executionSuccess" : true,
    "nReturned" : 0,
    "executionTimeMillis" : 130,
    "totalKeysExamined" : 100000,
    "totalDocsExamined" : 0,
}
For more information, please visit www.scalegrid.io. Connect with ScaleGrid on LinkedIn, X, Facebook, and YouTube.
Table of Contents

Stay Ahead with ScaleGrid Insights

Dive into the world of database management with our monthly newsletter. Get expert tips, in-depth articles, and the latest news, directly to your inbox.

Related Posts

high available cluster

High Availability Clustering & Why You Need It

High availability clustering keeps your IT systems running without interruptions, even amid failures. This guide details high availability clustering, its...

blog-feature-img_whats-new-at-scalegrid

What’s New at ScaleGrid – July 2024

ScaleGrid is excited to announce our latest platform updates, showcasing our unwavering commitment to security, usability, and performance. Our recent...

database backend

What is RabbitMQ Used For

RabbitMQ is an open-source message broker facilitating the connection between different applications within a distributed setup. It is widely utilized...

NEWS

Add Headline Here