Isaac Smith - https://unsplash.com/photos/AT77Q0Njnt0
Python is continuing its path as the fastest growing and most used programming language for data science, and the number of available libraries for data visualization is also rising. It's extremely important to know all the data visualization libraries out there - including their strengths and weaknesses - before choosing one to create data science project graphs.
Whether you are doing data analysis or prototyping machine learning models, knowing how to visualize the dataset is a very useful skill to have. In this post I will introduce the most utilized Python visualization libraries, alongside practical examples.
As you might know, the Python visualization landscape is complex and it can be challenging to find the right tool for the job. At the PyCon conference back in 2017, Jake VanderPlas presented a talk describing the whole Python visualization landscape, giving us a quick idea of how different visualization libraries work and how they were connected.
I’ll go a step further and introduce you to my own adaptation, showing all the Python visualization libraries you should know in 2020.
In the image above, you can see dotted blue lines delimiting two large groups, and different colors showing different visualization types we’ll cover later on. Also, you can see my personal recommendations marked with a green tick symbol.
🤔 Which library to use at any given time?
As there are too many use cases and applications that may need different tools and libraries, that’s probably not the first question you should ask. It’s much more important to understand how you can make an effective visualization no matter the libraries you use.
“Visualization gives you answers to questions you didn’t know you had.”
– Ben Shneiderman
Also, selecting the most powerful tool available isn’t always the best idea as usability and learning curves can be steep, while a simpler tool might be able to create exactly what’s needed easily and efficiently.
On the image above, you can see different libraries that cover practically every data visualization you may need - from Folium for creating dynamic maps to missingno for displaying missing data visualizations. Some of these libraries can be used no matter the field of application, yet many of them are intensely focused on accomplishing specific tasks.
🖼 Static vs Dynamic visualizations
Depending on the area where the visualization is shown and intended for your audience, you will have to decide whether to use static or dynamic visualizations.
Static visualizations are commonly seen as infographics posted on the web, and are generally placed in reports or printed as handouts. Users won’t go beyond what they see, they cannot explore your visualization in detail.
On the other hand, dynamic or interactive visualizations are commonly seen on the web - most of the time as dashboards or applications. In this type of visualization, users can play around with the data and change what they see on screen or get extra information while hovering data points, selecting options or just clicking and dragging the mouse on a map.
📊 SciVis vs InfoVis vs GeoVis
James A. Bednar published an article where he came up with the ‘SciVis vs InfoVis’ categorization. SciVis or Scientific visualizations are the representations of data graphically as a means of gaining understanding and insight into the data. This type of chart allows insight into the system that is studied in ways that were previously impossible. In scientific visualization, we seek to understand the data.
Some SciVis Python libraries are: VisPy, Glumpy, Pygal, YT, PyQtGraph and Mayavi. Some of them underneath use OpenGL graphics standard, delivering graphics-intense visualizations of three or four dimension “physically situated data.”
The other group mentioned by Bednar is InfoVis or Information Visualization, which focuses on two or three dimension static or interactive visual representations of numerical and non-numerical abstract data in order to amplify human recognition.
These types of visualizations are the most used and commonly seen, and cover Python libraries such as matplotlib, seaborn, Plotnine, Bokeh, Altair and Plotly.
It’s important to mention a third group: the GeoVis or Geovisualization group, which refers to a set of tools and techniques supporting the analysis of geospatial data through the use of interactive visualization. This field is in full growth due to the large number of satellite images and the possibility of processing them on our computers.
🙌 Exploring Python libraries
“A picture is worth a thousand data points”, so get to work! Let’s see Python visualization libraries in action.
- The source code of all the examples shown below is available at the following GitHub repository.
Released in 2003, matplotlib is a robust plotting library which allows you to have low-level control over every component of your graph. With matplotlib you’ll be able to create simple yet powerful visualizations. It’s one of the oldest and by far most popular InfoVis libraries in Python, with a lot of different plot types.
It has a pyplot module which is a collection of functions that make matplotlib work like MATLAB, and there are various plots which can be created using it. Some examples are bar graphs, histograms, line plots, scatter plots, area plots and pie plots, and we can see some of them below:
Many other libraries are built on top of matplotlib’s core, providing domain-specific APIs. Examples of that are seaborn, pandas and ggpy.
Seaborn is a graphic library which provides a high-level interface built on top of matplotlib. It makes it easier to generate certain kinds of plots, including heat maps, time series, and violin plots.
Output graphs are prettier and more sophisticated than what we see withhile using raw matplotlib or pandas. Let’s see some examples:
If you came from the R language, plotnine iswill be your best choice as it’s an implementation of a grammar of graphics in Python and it’s based on popular R’s plotting library ggplot2.
The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot. The underlying grammar of graphics is accompanied by a consistent API that allows you to quickly and iteratively create different types of beautiful data visualisations while rarely having to consult the documentation.
Altair is a simple, friendly and consistent expressive and declarative statistical visualization python library based on Vega-Lite.
With Altair, you will be able to create meaningful, elegant, and effective visualizations with just a few lines of code and in a very short time.
Bokeh, native to Python, is also based on The Grammar of Graphics like R’s ggplot2. It supports streaming, and real-time data.
With plotly you can create some unique charts like dendrograms, 3D charts, and contour plots, which you cannot generate through most of the other tools.
missingno is a small matplotlib-based Python library which helps you show and explore missing data.
It provides built-in visualizations that let you visualize missing data from different perspectives: Bar chart (like shown below, which displays a count of values present per column, ignoring missing values), Matrix, Heatmap and Dendrogram.
VisPy is an OpenGL based plotting library for creating interactive scientific visualizations. It’s designed to be fast, scalable and easy to use.
As it is an open-source library, it can be easily customizable per your requirements.
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
Dash is a framework for building interactive dashboards using pure Python. It is built on top of Flask, Plotly.js, ReactJS.
Dash is open source, and its dashboards can run on the web browser.
Cartopy is a Python package which provides a set of tools for creating projection-aware geospatial plots using Python’s standard plotting package, matplotlib.
Key features of cartopy are its object oriented projection definitions, and its ability to transform points, lines, vectors, polygons and images between those projections.
The bottom line is that Cartopy provides a very easy, cartographically accurate method for producing maps, and pairs well with other Python tools like geopandas.
Folium is another wonderful Python geovisualization library used for plotting maps, which uses the mapping strengths of the Leaflet.js library, enabling interactive map visualizations.
Using Folium you can zoom in and out on your output maps, click and drag them, or even add markers which are super useful features.
✨ And we arrive to the end
Thanks for reading! I hope you enjoyed the post and have discovered at least one new library to use on your data science projects.
“By visualizing information, we turn it into a landscape that you can explore with your eyes, a sort of information map. And when you’re lost in information, an information map is kind of useful.” – David McCandless
- The source codes for all the examples are available at this GitHub repository.
Would you like to add any other Python visualization library that is not listed?
What visualization libraries are your favorite?
Share them in a comment below.