Data Science in Python – Matplotlib – Part 4


After providing an introduction to Pandas in the previous article, enabling us to select and manipulate data, this article will focus on data visualization. It is well-known that with appropriate graphics, data can often be understood even better and allow for a different type of interpretation, independent of averages and other metrics.
Which library for data visualization in Python?
In the library jungle of Python, there are countless libraries suitable for visualization. The range extends from a simple scatter plot to the representation of a neural network's structure and even 3D visualizations. The oldest and most mature data visualization library in the Python ecosystem is Matplotlib. With the development starting in 2003, it has continuously evolved and also forms the basis for the seaborn library. It offers a wide range of customization options and is oriented towards MATLAB. The name connection is obvious ;-).
Starting with Matplotlib
Creating graphics with Matplotlib can be done in two ways:
- via
pyplot - Object-oriented approach
It should be noted at this point that pyplot is a sub-library of matplotlib. The first approach is the most widespread, as it provides an easy entry into Matplotlib. In contrast, the second is suitable for very detailed graphics that require many adjustments. In this blog post, the graphics will be created exclusively with pyplot. The creation is oriented towards the following procedure:
- Creation of a figure object including axes objects
- Editing the axes objects
- Output of the object
Consequently, we directly edit an object in Python, rather than layering graphics as with the "Grammar of Graphics."
Initializing a Graphic
Importing matplotlib is typically as follows:
# Import Matplotlib
import matplotlib.pyplot as plt
Now, we need to create a figure object. At this point, one should also consider how many representations to integrate into the object, as this facilitates the workflow later on. In the following code example, a figure with one column and one row is created.
# Initialization of a Figure
fig, ax = plt.subplots(nrows = 1, ncols = 1)
# Ausgabe der Figure
fig
The figure method returns two arguments: the actual figure and an Axes object. This is related to Matplotlib's object-oriented approach. If a figure includes multiple sub-representations, such as histograms of different data, the sub-representations can be individually edited via the Axes object, and the figure object combines all sub-representations. Thus, the figure includes the graphical representation, and the Axes object includes the individual sub-representations. Displaying the created variable fig gives the following representation:

Our graphic currently only contains a coordinate system. To further clarify the object-oriented approach, we will create a figure with a total of two rows and two columns.
# Creating a Figure with 4 separate graphics
fig, ax = plt.subplots(nrows = 2, ncols = 2)
Using index selection, one can now select individual sub-representations from the grid. The object-oriented approach should now be clear.
# Selecting the individual sub-representations
ax[0,0]
ax[0,1]
ax[1,0]
ax[1,1]
When initializing a graphic, adjustments regarding the size of the graphic are certainly conceivable, which can be implemented as follows:
# Setting the figure size
fig, ax = plt.subplots(nrows = 2, ncols = 2, figsize = (10, 12))
The graphic size is specified as a tuple (width, height) in inches.
Representation of Data
After demonstrating how to initialize a graphic in Matplotlib, we will now focus on populating it with data. For the entry, we will take the first figure from the blog post. To represent data, we access the Axes object and initially choose the plot function. A variety of other functions can be found at this link.
# Creating the graphic
fig, ax = plt.subplots(nrows = 1, ncols = 1)
# Creating sample data
sample_data = np.random.randint(low = 0, high = 10, size = 10)
# Displaying the sample data
ax.plot(sample_data)
# Output of the Figure
fig
It should be briefly noted here that Matplotlib has performed two operations in the background:
- The scale was automatically adjusted to the data.
- The data representation was done via the index, i.e., the actual value is on the y-axis.
If one believes, for example, that a scatter plot would be more suitable, the graphic can easily be changed. Instead of plot, one writes scatter. The code shows that, in this case, the data was not automatically represented via the index, but x and y had to be explicitly passed as arguments.
# Creating the graphic
fig, ax = plt.subplots(nrows = 1, ncols = 1)
# Creating sample data
sample_data = np.random.randint(low = 0, high = 10, size = 10)
# Displaying the sample data
ax.scatter(x = range(0, len(sample_data)), y=sample_data)
# Output of the Figure
fig
All typical data visualizations can be implemented with Matplotlib.
Labeling and Saving Graphics
Representing the data is one thing; with an appropriate title, they become more understandable.
As an example, we want to label a histogram with the sample data. Using the set_ functions, labeling can be easily done.
fig, ax = plt.subplots(nrows=1, ncols=1)
sample_data = np.random.randint(low = 0, high = 10, size = 10)
ax.hist(sample_data,width = 0.4)
# Adding labels
ax.set_xlabel('Sample data')
ax.set_ylabel('Frequencies')
ax.set_title('Histogram with Matplotlib')
# Output of the Figure
fig
We get the following graphic:

If one wishes to save their graphic, this can be done using the fig.savefig ('Path of the graphic') function.
Summary and Outlook
This blog post should provide an initial introduction to the Python visualization library Matplotlib and convey basic concepts. The adaptability of the graphics goes far beyond the functions we have presented:
- Customization of colors, gradients,...
- individual axis labeling/scaling
- Annotating graphics with text boxes
- Changing the font
- ...
To consider selecting the appropriate data for the graphics, how to implement it with Pandas can be read here. In the next post in this series, we will then focus on Scikit-Learn, the beginner machine learning library in Python.




