Understanding and Managing Disk Space on your MongoDB Server

Disk storage is a critical resource for any scalable database system. The performance of your disk-based databases will depend on how data is managed on the disk. Your MongoDB server supports various pluggable storage engines that handle storage management and initially store all documents sequentially. As the database grows and multiple write operations run, this contiguous space gets fragmented into smaller blocks with chunks of free space in between. The typical solution is to increase the disk size, however, there are alternatives that can help you regain the free space without having to scale your disk size. One important thing to be aware of is MongoDB storage statistics and how you can compact or repair the database to handle fragmentation.

How Large is Your Database, Really?

You should always keep an eye on the amount of free disk space on your production server, and also prudent to know your database size when you’re paying for it on a cloud platform. MongoDB has a command db.stats() that can provide insights into the storage statistics of a MongoDB instance.

>db.stats()
{
	"db" : "test",
	"collections" : 5,
	"views" : 0,
	"objects" : 53829,
	"avgObjSize" : 43.555,
	"dataSize" : 2344556121,
	"storageSize" :3124416336,
	"numExtents" : 0,
	"indexes" : 7,
	"indexSize" : 8096876,
	"ok" : 1
}

dataSize

The total size in bytes of the uncompressed data held in this database.

storageSize

The total amount of disk space allocated to all collections in the database.

The response of db.stats() is dependent on the type of MongoDB engine. You can find your version-dependent description of the above metrics at MongoDB documentation.

Why the big difference between storageSize and dataSize? This is due to fragmentation of data files explained earlier. MongoDB tries to reuse free space in between fragmented data whenever possible and does not release it to the operating system. However, in WiredTiger, storageSize may be smaller than dataSize if compression is enabled.

In the event a large chunk of data is deleted from a collection and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system so that it can be used by your other databases or collections. You’ll need to run a compact or repair operation in order to defragment the disk space and regain the usable free space.

Compacting MongoDB

MongoDB compact operation rewrites all documents and indexes in a collection to contiguous blocks of disk space. However, this operation blocks all other operations on the database to which the collection belongs. So, for a standalone server, it’s recommended to run it during a maintenance window, and for replica sets, you should run it in a rolling fashion for each shard. This means compacting all secondaries first, and then finally the primary so your database availability is not affected. The syntax of the command is:

db.runCommand({compact: collection-name })

1. MMAPv1

  • Compaction operation defragments data files & indexes. However, keep in ming that it does not release space to the operating system. The operation is still useful to defragment and create more contiguous space for reuse by MongoDB, but it’s of no use when the free disk space is very low.
  • An additional disk space up to 2GB  is required during the compaction operation.
  • A database level lock is held during the compaction operation.

2. WiredTiger

The WiredTiger engine provides compression by default which consumes less disk space than MMAPv1.

  • The compact process releases the free space to the operating system.
  • Minimal disk space is required to run the compact operation.
  • WiredTiger also blocks all operations on the database as it needs database level lock.

If you’re running WiredTiger, we recommend you run the compact operation when the storage has reached 80% of the disk size. You can do this by triggering ‘Compact’ operation from our details page.

Repair MongoDB

MongoDB repair operation repairs all errors and inconsistencies in data storage, similar to the fcsk command for a file system. This command ensures the data integrity after an unexpected shutdown or crash. However, if journaling is enabled on the server, then there is no requirement of repair as the server uses the journal to get into the clean state automatically after restart. If your database has been corrupted, then a repair database would not save the corrupt data, so it’s not recommended to use this operation for data recovery when you have other options.

For MMAPv1,  repair database is the only way to reclaim disk space if you think your database has not been corrupted and has enough space required by the repair operation. The syntax of the command is:

db.runCommand({repairDatabase: 1})
  • This command compacts all collections in the database and recreates all indexes.
  • The job requires free disk space equal to the size of your current data set plus 2 gigabytes.

At ScaleGrid, we use the repairDatabase operation to reclaim free space for MMAPv1 engine clusters.