Plotting Data in Python: matplotlib vs plotly

Blog

Plotting Data in Python: matplotlib vs plotly

Data visualization provides a powerful tool to explore, understand, and communicate the valuable insights and relationships that may be hidden within data. Whether it’s an initial exploratory analysis or a presentation to non-technical colleagues, proper visualization lies at the heart of data science. When it comes down to choosing how to visualize one’s data, the best tool for the job depends on the type of data, the purpose of the visualization, and the aesthetics which you hope to achieve. In this article, I will compare and demonstrate two common visualization tools used in Python: matplotlib and plotly.

Installing Python

If you want to follow along with this tutorial, you’ll need to have Python installed with the required packages. If you don’t have a recent version of Python, I recommend doing one of the following: Download and install the pre-built “Data Plotting” runtime environment for Windows 10 or CentOS 7, or

If you’re on a different OS, you can automatically build your own custom Python runtime with just the packages you’ll need for this project by creating a free ActiveState Platform account.

To get started:

Create a free ActiveState Platform account
Download:
1. ActivePython, which is a pre-built version of Python containing hundreds of packages that can help you solve your common tasks
2. The “Data Plotting” runtime, which contains a version of Python and most of the tools listed in this post so you can test them out for yourself.

NOTE: the simplest way to install the Data Plotting environment is to first install the ActiveState Platform’s command line interface (CLI), the State Tool.

If you’re on Windows, you can use Powershell to install the State Tool:

IEX(New-Object Net.WebClient).downloadString('https://platform.www.activestate.com/dl/cli/install.ps1')

If you’re on Linux / Mac, you can use curl to install the State Tool:
```
sh <(curl -q https://platform.www.activestate.com/dl/cli/install.sh)
```

Once the State Tool is installed, just run the following command to download the build and automatically install it into a virtual environment.

state activate Pizza-Team/Data-Plotting

Matplotlib vs Plotly: Plotting Data with Matplotlib

Matplotlib is quite possibly the simplest way to plot data in Python. It is similar to plotting in MATLAB, allowing users full control over fonts, line styles, colors, and axes properties. This allows for complete customization and fine control over the aesthetics of each plot, albeit with a lot of additional lines of code. There are many third-party packages that extend the functionality of matplotlib such as Basemap and Cartopy, which are ideal for plotting geospatial and map-like data. Seaborn and Holoviews provide higher level interfaces, which results in a more intuitive experience. Matplotlib is also integrated into the pandas package, which provides a quick and efficient tool for exploratory analysis.

I’ll be using pandas in addition to Basemap, which doesn’t come with the standard installation of matplotlib. You can install Basemap by following the instructions here.

To demonstrate the versatility of matplotlib, let’s import a few different datasets:

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
wine_names = ['Class', 'Alcohol', 'MalicAcid', 'Ash', 'Alc.Ash', 'Magnesium', 'TotalPhenols', \
'Flavanoids', 'Nonflav.Phenols', 'Proanthocyanins', 'ColorIntensity', 'Hue', 'OD280/OD315',\
'Proline']
wine_df = pd.DataFrame(pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', names = wine_names))
wine_df.Class = wine_df.Class - 1
wine_df.Class = wine_df.Class.astype('object')
nino_names = ['bouy','day', 'latitude', 'longitude', 'zon.winds', 'mer.winds', 'humidity', 'air.temp', 's.s.temp']
nino_df = pd.DataFrame(pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/el_nino-mld/elnino.gz',
header = None,na_values = '.', sep = '\s+', names = nino_names))
nino_df = nino_df.loc[nino_df['day'] == 3, ['bouy','latitude', 'longitude', 's.s.temp']].dropna()

I pulled two different datasets from the UCI Machine Learning Repository. The first is the wine dataset, which provides 178 clean observations of wine grown in the same region in Italy. Each observation consists of 13 features that are the result of a chemical analysis. The second is the el nino dataset, which contains spatiotemporal data from a series of buoys in the Pacific Ocean taken during the El Nino cycle of 1982-1983.

When dealing with data for the first time, an exploratory analysis is typically the first thing that is done. Plotting is an extremely useful tool in gaining an initial understanding of the data. In this case, we can plot wines based on their alcohol content (i.e., the x axis) and degree of dilution (i.e., an OD280/OD315 value shown along the y axis) in order to place them in a Class between 0 to 2.

Using pandas, different types of plots can be generated in a single line of code:

ax = wine_df.plot(kind = 'scatter', x = 'Alcohol', y = 'OD280/OD315', c= 'Class', figsize=(12,8), colormap='jet')

For further customization, a similar plot can be made using just matplotlib:

fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(x = wine_df['Alcohol'], y = wine_df['OD280/OD315'], c = wine_df['Class'])
ax.set_xlabel('Alcohol', fontsize=15)
ax.set_ylabel('OD280/OD315', fontsize=15)
ax.set_title('Wine Dataset')
ax.grid(True)
fig.tight_layout()
plt.show()

Notice that each additional plot feature typically requires an additional line of code. This does not necessarily add complexity, as each line can be understood by Python novices since the language is simple and straightforward. Additional plotting features, such as circles, lines, text, or even indicator arrows can be added using matplotlib with little difficulty. Examples demonstrating this can be found here.

The previous plots convey meaningful information in that they tell us two features alone can separate the observations into three different clusters. However, the visualizations are relatively simple. To illustrate a slightly more complex example, we can use Basemap to plot temperature data from each buoy in the El Nino dataset.

plt.figure(figsize=(14, 14))
m = Basemap(projection='ortho', resolution=None, lat_0=0, lon_0=-150)
m.bluemarble(scale=0.5);
m.scatter(nino_df['longitude'].values, nino_df['latitude'].values, latlon=True,
c=nino_df['s.s.temp'].values, s = 100,
cmap='coolwarm', alpha=0.5, marker = "o")
plt.colorbar(label='Temperacture (C)')
plt.clim(25, 30)
plt.show()

Clearly, matplotlib and the extensive third-party packages that are built upon it, provide powerful tools for plotting data of various types. Despite being syntactically tedious, it is an excellent way to produce quality static visualizations worthy of publications or professional presentations.

Matplotlib vs Plotly: Plotting Data with Plotly

Plotly is another great Python visualization tool that’s capable of handling geographical, scientific, statistical, and financial data. The company behind Plotly, also known as Plotly, makes an entire suite of visualization tools for multiple programming languages, all of which create interactive web-based visualizations and even web applications. Plotly has several advantages over matplotlib. One of the main advantages is that only a few lines of codes are necessary to create aesthetically pleasing, interactive plots. The interactivity also offers a number of advantages over static matplotlib plots:

Saves time when initially exploring your dataset
Makes it easy to modify and export your plot
Offers a more ornate visualization, which is well-suited for conveying the important insights hidden within your dataset.

Just like matplotlib, Plotly also has a few tools that allow it to easily integrate with pandas to make plotting even more efficient.

Previous versions of plotly offered an offline mode and an online mode. By using plotly online, your data visualizations are automatically uploaded so that you can access them through the online interface, regardless of how you create them. This feature is still available through Chart Studio, but I will be using the offline version.

Plotly Express is a great option for exploring pandas dataframes. It is a high-level wrapper included in the most recent version of plotly. To create a scatter plot similar to the one we created with matplotlib, run:

import plotly.express as px
fig = px.scatter(wine_df, x="Alcohol", y='OD280/OD315', color="Class", marginal_y="box",
marginal_x="box")
fig.show()

Feel free to play around with the interactive features of the plot. It is already clear that plotly creates superior visualizations to matplotlib with ease.

It can also tackle our temperature data from the El Nino dataset without breaking a sweat. Try this:

fig = px.scatter_geo(nino_df, lat='latitude', lon='longitude', locations=None, locationmode=None,
color='s.s.temp', text=None, hover_name='bouy',
color_discrete_map={}, color_continuous_scale='bluered', projection='orthographic')
fig.show()

Notice that when you move your cursor over each buoy, the buoy number, temperature value, and location information are displayed. This is all done by default with plotly, and this is barely scratching the surface of what it can do.

Matplotlib vs Plotly: Conclusions

To summarize, matplotlib is a quick and straightforward tool for creating visualizations within Python. The verbosity required to make anything more than a basic plot makes it more suitable for an initial exploratory analysis or a minimalist design. Matplotlib is also a great place for new Python users to start their data visualization education, because each plot element is declared explicitly in a logical manner.

Plotly, on the other hand, is a more sophisticated data visualization tool that is better suited for creating elaborate plots more efficiently. The interactivity and elegant aesthetics that come with plotly are benefits that cannot be ignored.

Download and install the pre-built “Data Plotting” runtime environment for Windows 10 or CentOS 7, or
Build your own custom Python runtime with just the packages you’ll need for this project, by creating a free ActiveState Platform account.

Use ActivePython and accelerate your Python projects.

The #1 Python solution used by innovative enterprise teams
Comes pre-bundled with top Python packages
Spend less time resolving dependencies and more time on quality coding

Take a look at ActivePython

Frequently Asked Questions

Does Plotly use Matplotlib?

No, Plotly does not use Matplotlib. In other words, Matplotlib is not a dependency for plotly. While both packages allow you to visualize data, they do it differently.

To get both Matplotlib and Plotly, download the “Data Plotting” Python runtime.

Are Plotly and Matplotlib the only two data visualization libraries in Python?

No. Python has many data visualization libraries, including:

Matplotlib – Python’s oldest data visualization library
Seaborn – based on Matplotlib, but produces nicer charts
Bokeh – creates interactive, web-ready plots
Plotly – creates interactive plots accessible from Jupyter notebooks
Dash – creates interactive dashboards
ggplot – layers components to create a plot
Pygal – can output charts as SVGs
Geoplotlib – for creating maps and plotting geographical data
Gleam – turn analyses into interactive web apps
Missingno – visually shows missing data

For more information on Dash and Bokeh, read Creating Python Dashboards: Dash Vs Bokeh.

What is plotting in Python?

Plotting in Python is simply the process of taking data and plotting it on a graph or chart in order to visualize it. For example, with Matplotlib you can create a line plot using the plt. plot() function, and then use the command plt. show() to display it.

Learn more about exploratory data analysis using Python.

What is the difference between Plotly and Matplotlib?

When it comes to Plotly vs Matplotlib, there are two major differences:

Matplotlib creates simple plots by declaring each plot element explicitly in a logical manner, making it easier for beginners but requiring more code.
Plotly creates more sophisticated plots with much less code, but may not be as suitable for beginners.

Download the pre-built Python Data Science runtime for Windows, Mac or Linux to get started quickly with the top data visualization and other data science packages.

Overview

Community Forum

Service Status

Featured

Introducing ActiveState’s Secure, Custom Container Images

Read post

Featured

Introducing ActiveState’s Secure, Custom Container Images

Read post

Vulnerability Blast Radius

Risk Prioritization Copilot

Precision Remediation Pipeline

USE CASES

Container Security

Vulnerability Management and Remediation

Software Supply Chain Security

Compliance and SBOM

Beyond End-of-Life Support

RESOURCES

Read

Watch

Attend

FEATURED

Introducing ActiveState’s Secure, Custom Container Images

Read More

FEATURED

Introducing ActiveState’s Secure, Custom Container Images

Read More

Docs

Support Overview

Community Forum

Service Status

Login