MongoDB®, Tech Tips & Insights

Enabling Data Compression in MongoDB 3.0

May 26, 2015

2 min read

SHARE THIS ARTICLE

MongoDB 3.0 with the WiredTiger storage engine enables you to transparently compress the data stored in your database. This is a fairly exciting and useful feature that can be used to reduce the disk space usage of your fast-growing data. By default wired tiger uses the ‘Snappy’ block compression engine for all the collections. You can turn off compression by default using the following options in the MongoDB server config file.

storage:
  engine: wiredTiger
  wiredTiger:
    collectionConfig:
      blockCompressor: none

The compression algorithm can be specified at the collection level during cluster creation. Here is an example of creating a collection with ‘zlib’ compression:

db.createCollection("test", {storageEngine: {wiredTiger: {configString: 'block_compressor=zlib'}}});

MongoDB WiredTiger storage engine provides two options for compression – snappy and zlib. There is essentially a tradeoff between the extent of compression and the amount of CPU load to decompress. ‘Zlib’ achieves a lot more compression and is correspondingly less performant. ‘Snappy’ aims for ‘aims for very high speeds and reasonable compression’.

We ran some simple unscientific tests to measure the compression performance. We used one of data sets storing strings which we felt would compress well. Here is the basic structure of each document:

{
'_id': <ObjectID>,
'name': <Five character string>,
'value': <Random 1MB string>
}

We inserted about 5000 of these documents (about 5GB of data)., and the results were fairly impressive. Zlib achieves a considerable amount of compression. Snappy also achieves a fair amount of compression with little or no load on the system:

	Zlib	Snappy	uncompressed
Data size (MB)	5000.5	5000.5	5000.5
Storage size (MB)	19.62	254.37	5019

As always you need to run some tests to understand the performance gains for your data set. Here are some more detailed benchmark studies on compression performance and tradeoffs:

http://www.mongodb.com/blog/post/new-compression-options-mongodb-30
http://www.acmebenchmarking.com/2015/02/mongodb-v30-compression-benchmarks.html

For more information, please visit www.scalegrid.io. Connect with ScaleGrid on LinkedIn, X, Facebook, and YouTube.

Stay Ahead with ScaleGrid Insights

Dive into the world of database management with our monthly newsletter. Get expert tips, in-depth articles, and the latest news, directly to your inbox.

Jul 1, 2024

High Availability Clustering & Why You Need It

High availability clustering keeps your IT systems running without interruptions, even amid failures. This guide details high availability clustering, its...

Jul 1, 2024

What’s New at ScaleGrid – July 2024

ScaleGrid is excited to announce our latest platform updates, showcasing our unwavering commitment to security, usability, and performance. Our recent...

Jun 28, 2024

What is RabbitMQ Used For

RabbitMQ is an open-source message broker facilitating the connection between different applications within a distributed setup. It is widely utilized...

Enabling Data Compression in MongoDB 3.0

Stay Ahead with ScaleGrid Insights

Related Posts

High Availability Clustering & Why You Need It

What’s New at ScaleGrid – July 2024

What is RabbitMQ Used For

Ready to Get Started?

Dive In for Free

See It in Action

Ask Us Anything

NEWS

Enabling Data Compression in MongoDB 3.0

Stay Ahead with ScaleGrid Insights

Related Posts

High Availability Clustering & Why You Need It

What’s New at ScaleGrid – July 2024

What is RabbitMQ Used For

Ready to Get Started?

Dive In for Free

See It in Action

Ask Us Anything

NEWS

Add Headline Here