Geographically Distributed MongoDB Replica Sets for 100% Uptime

Database availability is one of the most important aspects of application architecture. While preventing data center downtime is a given, it’s going to happen to everybody eventually. Even the best run data centers are going to go down completely every now and then. For example, the Amazon AWS outages of 8/26/13 and 9/13/13. The important question to ask is if this is acceptable for your application? Most applications can tolerate some downtime every now and then, however, certain applications require close to 100% uptime and the database architecture of these applications require a more deliberate design approach. Latencies between the data centers tend to be fairly large, so careful thought has to be put into the design of your MongoDB hosting deployment.

MongoDB Uptime Goals

  1. Your database should be up and writable, even if a complete datacenter goes down.
  2. Your database failover should be automatic in case of server/datacenter failure.
  3. A single server failure should not cause the primary to switch to a different datacenter.

Data Center Design

In order to satisfy our goals, we came up with a three data center designs using 4+1 replica set:

  1. Datacenter 1: Primary (Priority 10), Secondary 0 (Primary 9)
  2. Datacenter 2: Secondary 1 (Priority 8), Secondary 2(Priority 7)
  3. Datacenter 3: Arbiter

We place two full members in each of the first two data centers and an arbiter in the third data center. We also configured the priority for each server so that we can control which member becomes a primary in case of server failure.

100 uptime architecture for MongoDB
Geographically distributed MongoDB

There are a couple of downsides to this geo-distributed architecture:

  • If you have a write-heavy application, the secondaries in a different data center will always lag behind due to the larger latency. If some data is crucial, you might want to use a MongoDB write concern of “Majority” to make sure that all the nodes commit the data.
  • The MongoDB community builds do not have SSL enabled. You might want to make a build with SSL enabled or use the MongoDB DBaaS at ScaleGrid so that data flowing across regions is encrypted.

Amazon AWS / EC2 Availability

If you’re deploying MongoDB on AWS, each data center in this picture corresponds to an Amazon region and not to an availability zone. Amazon does not provide availability guarantees in a single availability zone, SLA’s are for the entire region. If you deploy across availability zones your SLA is 99.95% which is still a great SLA – however, if an entire region goes down, your database will go down. Also, certain AWS regions have only two availability zones, so special attention has to be given to placing the third node in a different region so that a single region downtime does not bring the entire database down.

Lower Cost Availability Across Geographies

A simpler version of the same architecture uses only three servers and places only one replica in each data center. The downside of this approach is that a single server failure will cause the primary to move across datacenters. However, this architecture costs less than the first architecture. Depending on your scenario, it might work for you.

100% uptime MongODB with multiple Datacenters

There are many ways of achieving high uptime with MongoDB, and this is just the way that works for our needs. If you have other interesting architectures, please email us at [email protected]. We would love to hear your thoughts!