Installing Python
If you want to follow along with this tutorial, you’ll need to have Python installed with the required packages. If you don’t have a recent version of Python, I recommend doing one of the following: Download and install the pre-built “Data Plotting” runtime environment for Windows 10 or CentOS 7, or
If you’re on a different OS, you can automatically build your own custom Python runtime with just the packages you’ll need for this project by creating a free ActiveState Platform account.
To get started:
- Create a free ActiveState Platform account
- Download:
- ActivePython, which is a pre-built version of Python containing hundreds of packages that can help you solve your common tasks
- The “Data Plotting” runtime, which contains a version of Python and most of the tools listed in this post so you can test them out for yourself.
NOTE: the simplest way to install the Data Plotting environment is to first install the ActiveState Platform’s command line interface (CLI), the State Tool.
- If you’re on Windows, you can use Powershell to install the State Tool:
IEX(New-Object Net.WebClient).downloadString('https://platform.activestate.com/dl/cli/install.ps1')
- If you’re on Linux / Mac, you can use curl to install the State Tool:
sh <(curl -q https://platform.activestate.com/dl/cli/install.sh)
Once the State Tool is installed, just run the following command to download the build and automatically install it into a virtual environment.
state activate Pizza-Team/Data-Plotting
Matplotlib vs Plotly: Plotting Data with Matplotlib
Matplotlib is quite possibly the simplest way to plot data in Python. It is similar to plotting in MATLAB, allowing users full control over fonts, line styles, colors, and axes properties. This allows for complete customization and fine control over the aesthetics of each plot, albeit with a lot of additional lines of code. There are many third-party packages that extend the functionality of matplotlib such as Basemap and Cartopy, which are ideal for plotting geospatial and map-like data. Seaborn and Holoviews provide higher level interfaces, which results in a more intuitive experience. Matplotlib is also integrated into the pandas package, which provides a quick and efficient tool for exploratory analysis.
I’ll be using pandas in addition to Basemap, which doesn’t come with the standard installation of matplotlib. You can install Basemap by following the instructions here.
To demonstrate the versatility of matplotlib, let’s import a few different datasets:
import pandas as pd import matplotlib.pyplot as plt from mpl_toolkits.basemap import Basemap wine_names = ['Class', 'Alcohol', 'MalicAcid', 'Ash', 'Alc.Ash', 'Magnesium', 'TotalPhenols', \ 'Flavanoids', 'Nonflav.Phenols', 'Proanthocyanins', 'ColorIntensity', 'Hue', 'OD280/OD315',\ 'Proline'] wine_df = pd.DataFrame(pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', names = wine_names)) wine_df.Class = wine_df.Class - 1 wine_df.Class = wine_df.Class.astype('object') nino_names = ['bouy','day', 'latitude', 'longitude', 'zon.winds', 'mer.winds', 'humidity', 'air.temp', 's.s.temp'] nino_df = pd.DataFrame(pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/el_nino-mld/elnino.gz', header = None,na_values = '.', sep = '\s+', names = nino_names)) nino_df = nino_df.loc[nino_df['day'] == 3, ['bouy','latitude', 'longitude', 's.s.temp']].dropna()
I pulled two different datasets from the UCI Machine Learning Repository. The first is the wine dataset, which provides 178 clean observations of wine grown in the same region in Italy. Each observation consists of 13 features that are the result of a chemical analysis. The second is the el nino dataset, which contains spatiotemporal data from a series of buoys in the Pacific Ocean taken during the El Nino cycle of 1982-1983.
When dealing with data for the first time, an exploratory analysis is typically the first thing that is done. Plotting is an extremely useful tool in gaining an initial understanding of the data. In this case, we can plot wines based on their alcohol content (i.e., the x axis) and degree of dilution (i.e., an OD280/OD315 value shown along the y axis) in order to place them in a Class between 0 to 2.
Using pandas, different types of plots can be generated in a single line of code:
ax = wine_df.plot(kind = 'scatter', x = 'Alcohol', y = 'OD280/OD315', c= 'Class', figsize=(12,8), colormap='jet')
For further customization, a similar plot can be made using just matplotlib:
fig, ax = plt.subplots(figsize=(12,8)) ax.scatter(x = wine_df['Alcohol'], y = wine_df['OD280/OD315'], c = wine_df['Class']) ax.set_xlabel('Alcohol', fontsize=15) ax.set_ylabel('OD280/OD315', fontsize=15) ax.set_title('Wine Dataset') ax.grid(True) fig.tight_layout() plt.show()
Notice that each additional plot feature typically requires an additional line of code. This does not necessarily add complexity, as each line can be understood by Python novices since the language is simple and straightforward. Additional plotting features, such as circles, lines, text, or even indicator arrows can be added using matplotlib with little difficulty. Examples demonstrating this can be found here.
The previous plots convey meaningful information in that they tell us two features alone can separate the observations into three different clusters. However, the visualizations are relatively simple. To illustrate a slightly more complex example, we can use Basemap to plot temperature data from each buoy in the El Nino dataset.
plt.figure(figsize=(14, 14)) m = Basemap(projection='ortho', resolution=None, lat_0=0, lon_0=-150) m.bluemarble(scale=0.5); m.scatter(nino_df['longitude'].values, nino_df['latitude'].values, latlon=True, c=nino_df['s.s.temp'].values, s = 100, cmap='coolwarm', alpha=0.5, marker = "o") plt.colorbar(label='Temperacture (C)') plt.clim(25, 30) plt.show()
Clearly, matplotlib and the extensive third-party packages that are built upon it, provide powerful tools for plotting data of various types. Despite being syntactically tedious, it is an excellent way to produce quality static visualizations worthy of publications or professional presentations.
Matplotlib vs Plotly: Plotting Data with Plotly
Plotly is another great Python visualization tool that’s capable of handling geographical, scientific, statistical, and financial data. The company behind Plotly, also known as Plotly, makes an entire suite of visualization tools for multiple programming languages, all of which create interactive web-based visualizations and even web applications. Plotly has several advantages over matplotlib. One of the main advantages is that only a few lines of codes are necessary to create aesthetically pleasing, interactive plots. The interactivity also offers a number of advantages over static matplotlib plots:
- Saves time when initially exploring your dataset
- Makes it easy to modify and export your plot
- Offers a more ornate visualization, which is well-suited for conveying the important insights hidden within your dataset.
Just like matplotlib, Plotly also has a few tools that allow it to easily integrate with pandas to make plotting even more efficient.
Previous versions of plotly offered an offline mode and an online mode. By using plotly online, your data visualizations are automatically uploaded so that you can access them through the online interface, regardless of how you create them. This feature is still available through Chart Studio, but I will be using the offline version.
Plotly Express is a great option for exploring pandas dataframes. It is a high-level wrapper included in the most recent version of plotly. To create a scatter plot similar to the one we created with matplotlib, run:
import plotly.express as px fig = px.scatter(wine_df, x="Alcohol", y='OD280/OD315', color="Class", marginal_y="box", marginal_x="box") fig.show()
Feel free to play around with the interactive features of the plot. It is already clear that plotly creates superior visualizations to matplotlib with ease.
It can also tackle our temperature data from the El Nino dataset without breaking a sweat. Try this:
fig = px.scatter_geo(nino_df, lat='latitude', lon='longitude', locations=None, locationmode=None, color='s.s.temp', text=None, hover_name='bouy', color_discrete_map={}, color_continuous_scale='bluered', projection='orthographic') fig.show()
Notice that when you move your cursor over each buoy, the buoy number, temperature value, and location information are displayed. This is all done by default with plotly, and this is barely scratching the surface of what it can do.
Matplotlib vs Plotly: Conclusions
To summarize, matplotlib is a quick and straightforward tool for creating visualizations within Python. The verbosity required to make anything more than a basic plot makes it more suitable for an initial exploratory analysis or a minimalist design. Matplotlib is also a great place for new Python users to start their data visualization education, because each plot element is declared explicitly in a logical manner.
Plotly, on the other hand, is a more sophisticated data visualization tool that is better suited for creating elaborate plots more efficiently. The interactivity and elegant aesthetics that come with plotly are benefits that cannot be ignored.
- Download and install the pre-built “Data Plotting” runtime environment for Windows 10 or CentOS 7, or
- Build your own custom Python runtime with just the packages you’ll need for this project, by creating a free ActiveState Platform account.
Frequently Asked Questions
Does Plotly use Matplotlib?
To get both Matplotlib and Plotly, download the “Data Plotting” Python runtime.
Are Plotly and Matplotlib the only two data visualization libraries in Python?
- Matplotlib – Python’s oldest data visualization library
- Seaborn – based on Matplotlib, but produces nicer charts
- Bokeh – creates interactive, web-ready plots
- Plotly – creates interactive plots accessible from Jupyter notebooks
- Dash – creates interactive dashboards
- ggplot – layers components to create a plot
- Pygal – can output charts as SVGs
- Geoplotlib – for creating maps and plotting geographical data
- Gleam – turn analyses into interactive web apps
- Missingno – visually shows missing data
For more information on Dash and Bokeh, read Creating Python Dashboards: Dash Vs Bokeh.
What is plotting in Python?
What is the difference between Plotly and Matplotlib?
- Matplotlib creates simple plots by declaring each plot element explicitly in a logical manner, making it easier for beginners but requiring more code.
- Plotly creates more sophisticated plots with much less code, but may not be as suitable for beginners.
Download the pre-built Python Data Science runtime for Windows, Mac or Linux to get started quickly with the top data visualization and other data science packages.