Data Visualization Using Apache Zeppelin – Tutorial

4 min read
Data Visualization Using Apache Zeppelin – Tutorial

SHARE THIS ARTICLE

In today’s world, data is being generated at an exponential rate, so much so that analysts are predicting our global data creation to increase 10x by 2025. Businesses are now collecting data across every internal system and external source that impacts their company; and with it comes an ever-growing need to analyze the data to gain insight into how it can be used to improve and enhance their business decisions. Apache Zeppelin, an open source data analytics and visualization platform, can take us a long way toward meeting that goal.

In this article, you’ll learn how to add a custom interpreter for MongoDB and MySQL and how to use it to query and visualize collection data. First, let’s start off with an overview of Apache Zeppelin and it’s feature set:

What is Apache Zeppelin?

Apache Zeppelin Data Visualization - ScaleGrid Blog

Apache Zeppelin is an open-source, web-based “notebook” that enables interactive data analytics and collaborative documents. The notebook is integrated with distributed, general-purpose data processing systems such as Apache Spark (large-scale data processing), Apache Flink (stream processing framework), and many others. Apache Zeppelin allows you to make beautiful, data-driven, interactive documents with SQL, Scala, R, or Python right in your browser.

Apache Zeppelin Features

Interactive Interface

Apache Zeppelin has an interactive interface that allows you to instantly see the results of your analytics and have an immediate connection with your creation:

Appache Zeppelin Interactive Interface - ScaleGrid Blog

Browser Notebooks

Create notebooks that run in your browser (both on your machine and remotely) and experiment with different types of charts for to explore your data sets:

Appache Zeppelin Integrations - ScaleGrid Blog

Integrations

Integrate with many different open source, big data tools such as Apache projects Spark, Flink, Hive, Ignite, Lens and Tajo.

Appache Zeppelin Integrations - ScaleGrid Blog

Dynamic Forms

Dynamically create input forms right in your notebook.

Appache Zeppelin Dynamic Forms - ScaleGrid Blog

Collaboration & Sharing

A diverse and vibrant developer community gives you access to new data sources that are being constantly added and distributed through their open source Apache 2.0 license.

Appache Zeppelin Collaboration & Sharing - ScaleGrid Blog

Interpreter

Apache Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Apache Zeppelin supports many interpreters such as Apache Spark, Python, JDBC, Markdown, and Shell.

Appache Zeppelin Interpreter - ScaleGrid Blog

Now, let’s get started creating your custom interpreter for MongoDB and MySQL.

Add a MySQL Interpreter

In the Apache Zeppelin platform, go to the drop-down menu in the top-right and click on Interpreter:

Add a MySQL Interpreter Appache Zeppelin - ScaleGrid Blog

Here’s where you can find a list of all interpreters. We need to create a new one for MySQL, so click on the “Create” button in the upper right-hand corner:

Create a New Interpreter Appache Zeppelin - ScaleGrid Blog

Enter a recognizable name for the interpreter (e.g. mysql), and choose group as JDBC:

Keep all the default options, but enter the required details and make sure that a connection to your MySQL server is established:

Create a New Interpreter Appache Zeppelin - ScaleGrid Blog

Enter Interpreter Details Appache Zeppelin - ScaleGrid Blog

We also need to add a custom artifact to the MySQL connector JAR so Zeppelin knows where to execute it from. Download the connector here, place it in the interpreter/jdbc folder and then provide the exact path to the artifact:

Dependencies Appache Zeppelin - ScaleGrid Blog

And that’s it! To test our interpreter, we need to create a new note. But first, lets set up our MongoDB interpreter as well.

 

Add a MongoDB Interpreter

Go back to your Interpreter page and click the “Create” button. We’re going to use this open source MongoDB interpreter, so you’ll next need to download the .zip file and rename it to .jar.

After that, go to interpreters/, create a mongodb/ folder, and paste the .jar into the folder.

Create a New Folder Appache Zeppelin - ScaleGrid Blog

You now have a new Interpreter group called mongodb. Go to your Interpreter page, enter a friendly name like mongodb, then choose mongodb under the Interpreter group dropdown.

Now, let’s enter our newly created ScaleGrid MongoDB cluster details in “Properties” found in the Cluster Details page under the Overview / Machines section.

Create a New Interpreter Appache Zeppelin - ScaleGrid Blog

Cluster Properties Appache Zeppelin - ScaleGrid Blog

And we’re done! Now it is time to test out our newly created interpreters.

Create a Zeppelin Note

To run queries that will help visualize our data, we need to create notes. From the Zeppelin header pane, click “Notebook”, and then “Create a new note”:

Create a New Note Appache Zeppelin - ScaleGrid Blog

Make sure the notebook header shows a connected status as denoted by a green dot in the top-right corner:

Connected Appache Zeppelin - ScaleGrid Blog

When creating a note, you’ll be presented with a dialog to enter more information. Choose the default interpreter as our newly created mysql and click “Create Note”.

New Note Name Appache Zeppelin - ScaleGrid Blog

Run Queries on the Note

Before we can run any queries, we also need to mention the type of interpreter we’ll be using for our note. We can do that by starting our note with “%mysql”. This will tell Zeppelin to expect MySQL queries in that note.

Interpreter Type Appache Zeppelin - ScaleGrid Blog

And now, we’re ready to query our database. For the purpose of this example, I’ll use my WordPress installation that contains a typical wp_options table to query and visualize its data.

Query Database Appache Zeppelin - ScaleGrid Blog

It works! You can now click on the various charts to visualize the data in different graph formats.

Graphic Formats Appache Zeppelin - ScaleGrid Blog

Similarly, for MongoDB, make sure you have data in the MongoDB cluster. You can add some by going to the Admin Tab and running Mongo queries.

Here’s an example of some MongoDB data in the note:

MongoDB Data Appache Zeppelin - ScaleGrid Blog

Share Links to your Notes

Now that your data ready for visualization and querying, you may want to show it off to your team. You can do this very easily by creating a shareable link to the note:

This shareable link will be available for anyone to view, and you can also choose to share a link to a specific graph only:

Share Note Links Appache Zeppelin - ScaleGrid Blog

Shareable Link Appache Zeppelin - ScaleGrid Blog

Apache Zeppelin Conclusion

Apache Zeppelin is an immensely helpful tool that allows teams to manage and analyze data with many different visualization options, tables, and shareable links for collaboration. Here are some helpful links to get you started:

Download Apache Zeppelin

MongoDB Interpreter

MySQL Connector

You can also explore other ways to visualize your data through MongoDB GUI’s, including the top four: MongoDB Compass, Robomongo, Studio 3T and MongoBooster.

As always, if you build something awesome, do tweet us about it @scalegridio

If you need help with hosting and management for Redis™*, reach out to us at support@scalegrid.io for further information.

For more information, please visit www.scalegrid.io. Connect with ScaleGrid on LinkedIn, X, Facebook, and YouTube.
Table of Contents

Stay Ahead with ScaleGrid Insights

Dive into the world of database management with our monthly newsletter. Get expert tips, in-depth articles, and the latest news, directly to your inbox.

Related Posts

pitr mysql

Master MySQL Point in Time Recovery

Data loss or corruption can be daunting. With MySQL point-in-time recovery, you can restore your database to the moment before...

Setting Up MongoDB SSL Encryption

In a world where data security is essential, enabling MongoDB SSL is critical in fortifying your database. This guide walks...

distributed storage system

What is a Distributed Storage System

A distributed storage system is foundational in today’s data-driven landscape, ensuring data spread over multiple servers is reliable, accessible, and...

NEWS

Add Headline Here