MongoDB Shards and Unbalanced Aggregation Loads

The aggregation framework is a vital cog in the MongoDB infrastructure. It helps you analyze, summarize and aggregate the data stored in MongoDB. Refer to this blog post for more details about the aggregation framework in MongoDB 2.6.

In the 2.6 release, MongoDB made a subtle but significant change in the way the underlying aggregation pipelines execute in a sharded environment. When working with sharded collections MongoDB splits the pipeline into two stages. The first stage or the “$match” phase runs on each shard and selects the relevant documents. If the query planner determines that a shard is not relevant based on the shard keys then this phase is not executed on that shard.

The subsequent stages run only on the “primary” shard for the collection. This shard merges the data from the other shards and runs the rest of the pipeline.  This results in considerable more load on the primary shard of the collection being aggregated.  Here is an example from one of our customers running three shards and using primarily aggregation queries:MongoDB 2.6 unbalanced shards using the aggregation framework

As you see, the load on the first shard is consistently 3-4 times the other reason. This is an extreme example since this in case the second and third shards were added later, hence the primary shard for all the collections is the first shard. So essentially, the subsequent stages of all our aggregation jobs run only on Shard1. If you examine the logs on the primary shard, you’ll see a number of “merge” commands retrieving data from the other shards.

Prior to 2.6, the subsequent stages of the aggregation pipeline used to run on your MongoDB servers and not on the primary shard.

So how do you handle this uneven load distribution? You have a couple of options:

  1. If you’re running aggregations on multiple collections, ensure the “primary shards” of the collections are evenly spread across your shards.
  2. If you have a high aggregation load on just one collection, you might need to use slightly larger machines for your primary shard.

As always, if you have any questions or comments, please email us at [email protected].