While talking about system performance characteristics, most DBaaS providers limit themselves to providing information about the hardware that their systems are provisioned on. It is indeed hard to talk accurately about the actual throughput/latency characteristics of a cloud-based deployment given the number of variables in such a system. Virtualized environments, unpredictable workloads, network latencies, different geographies are only some of the considerations.
However, it’s a good idea to have a fair understanding of the actual performance of your MongoDB deployment: so that you can provision accurately based on your application needs; so that you can actually compare various DBaaS providers to ensure that you are getting the most “bang for the buck”.
This blog is a primer on running some basic performance benchmarks on your MongoDB cluster. It goes into the details of how to configure and run YCSB benchmarks tests and interpret the results. The inspiration for it came from the recent MongoDB blog about performance improvements in MongoDB 3.0.
YCSB is a popular Java open-source specification and program suite developed at Yahoo! to compare the relative performance of various NoSQL databases. Its workloads are used in various comparative studies of NoSQL databases.
Setting up YCSB
This and later sections will guide you through a step by step process to setup, configure and run YCSB tests on your favorite DBaaS provider system.
In order to run workload tests, you will need a client machine, preferably in the same geographic location as your MongoDB cluster to avoid over the Internet latencies. Select a configuration that has a decent amount of juice to run multiple threads to load your Mongo cluster appropriately. The machine needs to have a recent version of Java, Maven and git installed.
- If Java, Maven or git is not already installed on your system, install them. Refer to the documentation available for your specific OS. Ensure that you install a Maven version compatible with your Java version. Test that all dependencies are working correctly. For e.g.
$ javac -version javac 1.8.0_25 $ mvn -version Apache Maven 3.3.1 (cab6659f9874fa96462afef40fcf6bc033d58c1c; 2015-03-14T01:40:27+05:30) Maven home: /usr/local/Cellar/maven/3.3.1/libexec Java version: 1.8.0_25, vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_25.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: "mac os x", version: "10.10.2", arch: "x86_64", family: "mac" $ git --version git version 1.9.5 (Apple Git-50.3)
- As suggested by the Github page of YCSB you could get the tar archive of YCSB. But we recommend building it from source. Steps are documented in the MongoDB README of YCSB. This will help us enable MongoDB authentication for your cloud provider later.
git clone git://github.com/brianfrankcooper/YCSB.git cd YCSB mvn clean package
- Note: If your
`mvn clean package`or
`mvn clean install`command fails due to errors in locating the “mapkeeper” package, delete or comment out the 2 instances of the “mapkeeper” entries in the
pom.xmlat the root level. Look at this Github issue for more information.
- Once the build is successful, you are now ready to run YCSB tests!
Most MongoDB providers provide MongoDB authentication by default and there is no way to disable it. Unfortunately, YCSB currently doesn’t support MongoDB authentication. The client implementation itself uses mostly, now, deprecated API calls. To meet our needs, we added a new MongoDB specific YCSB property,
'mongodb.auth' along with a few lines of code to support it. The changes are very simple and a diff can be found here. Default MongoDB specific YCSB properties are listed here.
Build the package again using
mvn again once the changes are complete. Refer to the section above on how to build YCSB using Maven.
Running the Tests
This section of the YCSB wiki lists next and subsequent activities in details. We will describe them here briefly along with other pointers.
- The next step is to choose the kind of workload you want to run. Take time to read and understand the Core Workloads section of the YCSB wiki. They are summarized here:
- Workload A: Update heavy workload: 50/50% Mix of Reads/Writes
- Workload B: Read mostly workload: 95/5% Mix of Reads/Writes
- Workload C: Read-only: 100% reads
- Workload D: Read the latest workload: More traffic on recent inserts
- Workload E: Short ranges: Short range based queries
- Workload F: Read-modify-write: Read, modify and update existing records
- Obviously, the individual workloads can be tweaked using Core Properties. You may want to choose a workload and tweak the properties to match something that matches the characteristics of your application. (This comparative study chose a bunch of interesting “tweaked” workloads). Also, refer to the MongoDB blog we mentioned in the first section. (Our test will pick up Workload A with default read/update ratios).
- Choose the number of operation (Property ‘operationcount’) so that the test itself will run for an appropriate amount of time. Tests that finish inside 30 minutes cannot be good indicators of general performance of the system.
- Choose the appropriate number of threads that YCSB should run. This really depends on how good your client machines is, how much load can your MongoDB cluster take and how representative it is of your actual application. We will run our benchmark tests against a range of threads.
- Run the load phase. Choose a record count (Property ‘recordcount’) to insert into the database that is close to the number of operations you intend to run on it. Choose an appropriate number of threads so that the insertion doesn’t take too long. For e.g.
./bin/ycsb load mongodb -s -P workloads/workloada -p recordcount=10000000 -threads 16 -p mongodb.url="mongodb://user:firstname.lastname@example.org:9999,server2.example.com:9999/dbname" -p mongodb.auth="true"
load‘ flag indicates that this is a load run.
s‘ flag prints status at 10 sec intervals
recordcount‘ is set to 10 million.
threads‘ sets the number of client threads to 16.
mongodb.auth‘ is the property that we wrote to enable MongoDB authentication.
- Remember to
- Redirect the stdout to a file.
- Use ‘
screen‘ or an equivalent method so that your session is not lost while running these operations
- Once the data load phase is complete, you are ready to run your workloads. For e.g.
./bin/ycsb run mongodb -s -P workloads/workloada -p mongodb.url="mongodb://user:email@example.com:9999,server2.example.com:9999/dbname" -p mongodb.auth="true" -p operationcount=10000000 -threads 2
- Repeat the runs with various number of threads. Remember to redirect the results so that you can compare them later. For e.g. we repeated our tests for 2, 4, 8, 16 & 32 threads.
The final section of this YCSB wiki page talks about analyzing results. The most interesting bits of information are the Overall Throughput and 95/99% Percentile Latencies. Usually increasing the number of threads increases the throughput until such time when the gains flatten out and the latencies become unacceptable. For e.g. here’s a plot of Throughput and Latency versus # of threads for a test system we were trying to benchmark. Workload selected was Workload A and around 3 million operations.
It can be concluded from the graph that 16 threads is probably the “sweet spot” from a load standpoint for this MongoDB server: Beyond it the throughput line is flat even for an exponential growth in # of threads while latencies grow to become unacceptably large.
A few pointers:
- For a better picture of system performance over the cloud, automate and then repeat these tests are various points of the day. We have noticed that performance characteristics can vary significantly through the day.
- When comparing two potential DBaaS providers, ensure that you select your client machines and the DBaaS cluster in the same geography. The clusters should be of similar configuration. Also, remember to run the tests are various times in the day.
Here are a few things that we intend to investigate as we do more work in this area:
- Running workloads from multiple machines in parallel: When attempting to load a high capacity MongoDB cluster, a single client machine will not suffice. YCSB currently provides no easy way of running workloads from multiple machines in parallel. However, it can be done manually. This will also be useful when attempting to load data into a large cluster.
- Size of the Dataset: The size of the database versus the memory of the MongoDB systems will change absolute throughput/latencies characteristics given that for larger data sets MongoDB will have to hit the disk.
- Size of individual records: It will be interesting to the performance characteristics when record sizes are large, especially when it is close to the maximum supported record size. This might be crucial to applications that do mostly read-modify-write back operations (like Workload F).
- Alternate MongoDB drivers: Since we were currently interested in comparing two different DBaaS providers, we didn’t attempt to use more efficient database drivers. Obviously, much better absolute numbers can be achieved with the latest and more efficient drivers. This will be interesting for applications trying to extract the last ounce of juice out of their system. This blog talks about performance improvement measurements through YCSB by using an async MongoDB driver.
- Alternate Benchmarking Tools: Sysbench for MongoDB is one that we find interesting. We are looking at others.