Index Prefix Compression in MongoDB 3.0 WiredTiger

Image courtesy  Timothymorgan

MongoDB 3.0 with WiredTiger introduces a new feature called ‘Index Prefix Compression’ which greatly reduces the memory consumed by the indexes. Less memory used by indexes means more memory for document storage or other indexes which implies better performance.

Image courtesy: Timothymorgan

For best performance in MongoDB it is great to keep your indexes in memory. A page miss on an index is a double whammy – one page fault to bring the actual index page in memory and another page fault later to bring the data page into memory.

Technology

Index prefix compression does not use block compression (like zlib, snappy etc) but is a different technique to store the indexes in memory. It reduces memory usage by storing identical prefixes only once. The “key prefix compression” is a domain-specific way of compressing data and refers to the key storage format in WiredTiger. For more details, you can refer to the WiredTiger documentation of file formats.

Performance Tests

For our performance tests, we use a document structure as detailed below:

{
   employeeID: <long>,
   firstName: <string>,
   lastName: <string>,
   income: <long>,
   supervisor: {ID: <long>, 'firstName': <string>, 'lastName': <string>;}
}

We added the following indexes on this setup:

Index 1: db.ensureIndex({'employeeID':1});
Index 2: db.ensureIndex({'lastName':1, 'firstName':1});
Index 3: db.ensureIndex({'income':1});
Index 4: db.ensureIndex({'supervisor.lastName':1, 'supervisor.firstName':1})

Results

In our test run we inserted identical data (about 10 million records) into two clusters – one 2.6.x replica set and the other one a MongoDB 3.0 with WiredTiger. Then we added the above indexes on both the cluster configurations. The results are quite staggering – in some cases there is an order of magnitude difference in the index size!

 Index name MMAP index size (MB) WT Index size (MB)  % Reduction in size
{employeeID:1} 230.7 94 59%
{lastName:1, firstName:1} 1530 36 97%
{income:1} 230 94 59%
{‘supervisor.lastName’:1, ‘supervisor.firstName’:1} 1530 35 97%

 

Mongodb Index size-2.6.x

MongoDB index size wired tiger

All the memory saved on indexes is memory that can be used for caching data, other indexes etc. Your mileage might vary – be sure to test out your particular index structure. The reduction of index sizes is a much-undersold improvement in Mongo 3.0 and can make a tremendous difference to your performance!

 


Dharshan is the founder of ScaleGrid.io (formerly MongoDirector.com). He is an experienced MongoDB developer and administrator. He can be reached for further comment at @dharshanrg


  • Speed Router

    Darshan.. is wired tiger a document level compression or collection level compression???

    We have a scenario where we have JSON coming from several systems but we don’t want to change the filednames. we have that filednames quite some long sizes..Is there a best compression technique u suggest ? so ideally compression technique should identify redundant filednames across json and compress them as needed… I also heard about prefix index compression I dunno if that helps anything …any help is greatly appreciated… thanks in advance

0 Shares
+1
Tweet
Share
Share
Pin