Best Practice for Creating Indexes on your MySQL™ Tables – Rolling Index Builds

2 min read
Best Practice for Creating Indexes on your MySQL™ Tables – Rolling Index Builds

SHARE THIS ARTICLE

By having appropriate indexes on your MySQL tables, you can greatly enhance the performance of SELECT queries. But, did you know that adding indexes to your tables in itself is an expensive operation, and may take a long time to complete depending on the size of your tables? During this time, you are also likely to experience a degraded performance of queries as your system resources are busy in index-creation work as well. In this blog post, we discuss an approach to optimize the MySQL index creation process in such a way that your regular workload is not impacted.

MySQL Rolling Index Creation

We call this approach a ‘Rolling Index Creation’ – if you have a MySQL master-slave replica set, you can create the index one node at a time in a rolling fashion. You should create the index only on the slave nodes so the master’s performance is not impacted. When index creation is completed on the slaves, we demote the current master and promote one of the slaves that is up-to-date as the new master. At this time, the index building continues on the original master node (which is a slave now). There will be a short duration (tens of seconds) during which you will lose connectivity to your database due to the failover, but this can be overcome by having application-level retries.

Performance Benefits of Rolling Index Creation

We did a small experiment to understand the performance benefits of Rolling Index Creation.

The test utilized a MySQL dataset created using Sysbench which had 3 tables with 50 million rows each. We generated load on the MySQL master with 30 clients running a balanced workload (50% reads and 50% writes) for 10 minutes, and at the same time, built a simple secondary index on one of the tables in two scenarios:

  1. Creating the index directly on the master
  2. Creating the index on the slave

MySQL Test Bed Configuration

MySQL Instance Type EC2 instance m4.large with 8GB RAM
Deployment Type 2 Node Master-Slave Set with Semisynchronous Replication
MySQL Version 5.7.25

Performance Results

Scenario Workload Throughput (Queries Per Second) 95th Percentile Latency
Index Creation on Master 453.63 670 ms
Rolling Index Creation 790.03 390 ms

Takeaway

By running index creation directly on the MySQL master, we could experience only 60% of the throughput that was achieved by running index creation on the MySQL slave through a rolling operation. The 95th percentile latency of queries was also 1.8 times higher when the index creation happened on the master server.

Automating the Rolling Index Creation

ScaleGrid automates the Rolling Index Creation for your MySQL deployment with a simple user interface to initiate it.

alterblog3

In the UI above, you can select your Database and Table name, and ‘Add Index’ as the Alter Table Operation. Then, specify a Column Name and Index Name, and an Alter Table Command will be generated and displayed for you. Once you click to Create, the index creation will happen one node at a time in a rolling fashion.

Additionally, ScaleGrid also supports other simple Alter Table operations like adding a new column to your table in a rolling fashion. Stay tuned for my follow-on blog post with more details!

For more information, please visit www.scalegrid.io. Connect with ScaleGrid on LinkedIn, X, Facebook, and YouTube.
Table of Contents

Stay Ahead with ScaleGrid Insights

Dive into the world of database management with our monthly newsletter. Get expert tips, in-depth articles, and the latest news, directly to your inbox.

Related Posts

pitr mysql

Master MySQL Point in Time Recovery

Data loss or corruption can be daunting. With MySQL point-in-time recovery, you can restore your database to the moment before...

Setting Up MongoDB SSL Encryption

In a world where data security is essential, enabling MongoDB SSL is critical in fortifying your database. This guide walks...

distributed storage system

What is a Distributed Storage System

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and...

NEWS

Add Headline Here