Mastering Disk Space Management with MongoDB® Storage Engines

7 min read
Mastering Disk Space Management with MongoDB® Storage Engines

SHARE THIS ARTICLE

MongoDB offers several storage engines that cater to various use cases. The default storage engine in earlier versions was MMAPv1, which utilized memory-mapped files and document-level locking. This allowed for sequential access and indexed access, but random writes could cause performance issues.

The newer, pluggable storage engine, WiredTiger, addresses this by using prefix compression, collection-level locking, and row-based storage. WiredTiger excels with operational databases and transactional workloads as it offers b-tree-based storage and well-ordered data structures.

Choosing the appropriate storage engine can have a significant impact on application performance. WiredTiger is a good all-purpose engine while In-Memory is better for specific use cases such as real-time analytics.

MongoDB storage engines

mongodb storage engines

MongoDB offers multiple storage engines: in-memory storage engine, MMAPv1, WiredTiger, and encrypted storage engine. Choosing the right storage engine is crucial for application performance as each engine has its own strengths and weaknesses.

In-memory Storage Engine

This storage engine works completely in RAM, providing the fastest possible access to data. However, it is limited by the available free memory amount, and all data is lost when the server stops.

In-Memory Storage Engine, as the name suggests, stores data in memory for faster performance and lower latencies. This engine is useful for applications that require quick access to single records. However, due to its reliance on the virtual memory subsystem, it is not suitable for larger datasets.

MMAPv1 Storage Engine

MMAPv1 is the default storage engine for earlier versions of MongoDB, which stores data on disk with collection-level locking. It uses a filesystem cache and write-ahead log for crash recovery.

  • Compaction operation defragments data files & indexes. However, keep in ming that it does not release space to the operating system. The operation is still useful to defragment and create more contiguous space for reuse by MongoDB, but it’s of no use when the free disk space is very low.
  • An additional disk space up to 2GB  is required during the compaction operation.
  • A database level lock is held during the compaction operation.

WiredTiger Storage Engine

WiredTiger Storage Engine is a powerful storage engine for MongoDB that replaced the MMAPv1 storage engine as the default option for the database starting in version 3.2. B-tree based and known for its blazing fast performance, WiredTiger actively supports compression, encryption, and multiple concurrency controls to ensure streamlined and efficient operations for all users, handling transactional workloads with ease.

Benefits of WiredTiger Storage Engine

The WiredTiger storage engine supports managing of larger datasets, leveraging multiple CPU cores, and delivers a unique storage engine that is well-suited for handling read-intensive and write-heavy tasks.

It’s worth noting that WiredTiger is not the only storage engine available with MongoDB; it’s simply the new default. Nevertheless, its enterprise features like bloom filters and enhanced compression algorithms make it a popular choice for businesses that demand optimal performance from their operational databases.

When the storage engine detects conflicts between two operations, one will incur a write conflict, prompting MongoDB to transparently repeat that write operation. MongoDB makes use of both the filesystem cache and the WiredTiger internal cache. By default, the WiredTiger cache will occupy 50% of RAM minus 1 GB, or 256 MB.

The WiredTiger storage engine provides compression by default which consumes less disk space than MMAPv1.

  • The compact process releases the free space to the operating system.
  • Minimal disk space is required to run the compact operation.
  • WiredTiger also blocks all operations on the database as it needs database level lock.

If you’re running WiredTiger, we recommend you run the compact operation when the storage has reached 80% of the disk size. You can do this by triggering ‘Compact’ operation from our details page.

WiredTiger is the recommended storage engine in recent releases of MongoDB due to its benefits in scalability and performance. It features document-level locking, prefix compression, and collection-level configuration options. It has better support for tiered storage and larger datasets with its use of a well-ordered data structure.

Encrypted Storage Engine

Lastly, the encrypted storage engine provides security for data at rest. It encrypts all data files and provides a layer of security for sensitive data. The choice of storage engine ultimately depends on the requirements of the application, such as read and write patterns, security needs, and scalability.

MongoDB Engines: MMAPV1 Vs WiredTiger

wiredtiger

MongoDB offers different storage engines, two of which are MMAPV1 and WiredTiger. MMAPV1 is the default storage engine in earlier versions of MongoDB, while WiredTiger became the default engine starting in version 3.2 released in December 2015.

Feature Comparison: MMAPV1 vs WiredTiger

MMAPV1 provides document-level locking that allows for independent updates on different documents in a single collection. However, it lacks support for in-memory storage and has limited configuration options.

On the other hand, WiredTiger offers more advanced features such as collection-level locking, which allows for independent updates on different collections.

It also includes prefix compression, which allows for reduced storage costs. WiredTiger is the recommended engine for larger datasets specific workloads that require high-performance storage and transactional workloads.

Choosing the Right MongoDB Engine

When deciding which engine to choose, it is essential to consider specific requirements such as the size and complexity of the dataset, data modifications, desired performance, and storage costs.

For operational databases, WiredTiger is the recommended choice due to its support for high-performance transactional workloads. For smaller datasets or testing environments, MMAPV1 may be suitable due to its simplicity and default configuration.

In summary, MongoDB offers different storage engines, each with its own key features and benefits. When choosing the best engine for specific requirements, it is essential to consider the specific workload’s size of the dataset, desired performance, and storage costs.

Why WiredTiger Is the Default Storage Engine for MongoDB

WiredTiger became the default storage engine for MongoDB due to several advantages it holds over MMAPv1, the original storage engine. WiredTiger storage engine offers document-level concurrency control, whereas MMAPv1 has collection-level locking.

It also has a write-ahead log, making it more resistant to crashes. WiredTiger uses prefix compression, reducing storage costs for larger datasets. Overall, it performs better for random writes and operational databases with low latency and transactional workloads due to its B-tree structure and well-ordered data structure.

In addition, WiredTiger is a more general-purpose engine and offers pluggable storage engines. This allows users to choose the engine that best suits their specific needs, such as tiered storage or in-memory storage engines. WiredTiger’s suitability as a more general-purpose engine and its numerous advantages led to its selection as the default storage engine for the newer release series of MongoDB.

Managing Disk Storage in MongoDB: Importance of Storage Engines

Disk storage is a critical resource for any scalable database system. The performance of your disk-based databases will depend on how data is managed on the disk. Your MongoDB server supports various pluggable storage engines that handle storage management and initially store all documents sequentially.

As the database grows and multiple write operations run, this contiguous space gets fragmented into smaller blocks with chunks of free space in between. The typical solution is to increase the disk size, however, there are alternatives that can help you regain the free space without having to scale your disk size. One important thing to be aware of is MongoDB storage statistics and how you can compact or repair the database to improve performance and handle fragmentation.

How Large is Your Database, Really?

mmapv1

You should always keep an eye on the amount of free disk space on your production server, and also prudent to know your database size when you’re paying for it on a cloud platform. MongoDB has a command called db.stats() that can provide insights into the storage statistics of a MongoDB instance. Executing that command looks something like this:

>db.stats()
{
"db" : "test",
"collections" : 5,
"views" : 0,
"objects" : 53829,
"avgObjSize" : 43.555,
"dataSize" : 2344556121,
"storageSize" :3124416336,
"numExtents" : 0,
"indexes" : 7,
"indexSize" : 8096876,
"ok" : 1
}

dataSize

The total size in bytes of the uncompressed data held in this database.

storageSize

The total amount of disk space allocated to all collections in the database.

The response of db.stats() is dependent on the type of MongoDB engine. You can find your version-dependent description of the above metrics at MongoDB documentation.

Why the big difference between storageSize and dataSize? This is due to fragmentation of data files explained earlier. MongoDB tries to reuse free space in the data directory between fragmented data whenever possible and does not release it to the operating system. However, in WiredTiger, storageSize may be smaller than dataSize if data compression is enabled.

In the event a large chunk of data is deleted from a collection and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system so that it can be used by your other databases or collections stored there. You’ll need to run a compact or repair operation in order to defragment the disk space and regain the usable free space stored there.

Compacting MongoDB

MongoDB compact operation rewrites all documents and indexes in a collection to contiguous blocks of disk space. However, this operation blocks all other operations on the database to which the collection belongs.

So, for a standalone server, it’s recommended to run it during a maintenance window, and for replica sets, you should run it in a rolling fashion for each shard. This means compacting all secondaries first, and then finally the primary so your database availability is not affected. The syntax of the command is:

db.runCommand({compact: collection-name })

Repair MongoDB

MongoDB repair operation repairs all errors and inconsistencies in data storage, similar to the fcsk command for a file system. This command ensures the data integrity after an unexpected shutdown or crash.

However, if journaling is enabled on the server, then there is no requirement of repair as the server uses the journal to get into the clean state automatically after restart. If your database has been corrupted, then a repair database would not save the corrupt data, so it’s not recommended to use this operation for data recovery when you have other options.

For MMAPv1,  repair database is the only way to reclaim disk space if you think your database has not been corrupted and has enough space required by the repair operation. The syntax of the command is:

db.runCommand({repairDatabase: 1})
  • This command compacts all collections in the database and recreates all indexes.
  • The job requires free disk space equal to the size of your current data set plus 2 gigabytes.

At ScaleGrid, we use the repairDatabase operation to reclaim free space for MMAPv1 engine clusters.

For more information, please visit www.scalegrid.io. Connect with ScaleGrid on LinkedIn, X, Facebook, and YouTube.
Table of Contents

Stay Ahead with ScaleGrid Insights

Dive into the world of database management with our monthly newsletter. Get expert tips, in-depth articles, and the latest news, directly to your inbox.

Related Posts

message communication network

What Is RabbitMQ: Key Features and Uses

RabbitMQ is an open-source message broker that simplifies inter-service communication by ensuring messages are effectively queued, delivered, and processed across...

Intro to Redis Sharding

Grasping the concept of Redis sharding is essential for expanding your Redis database. This method involves splitting data over various...

Redis vs Memcached in 2024

Choosing between Redis and Memcached hinges on specific application requirements. In this comparison of Redis vs Memcached, we strip away...

NEWS

Add Headline Here