Search

# Data Is More Understandable When Visualized: Data Visualization by Hrithik Sharma

Note:- The all data which is used in this article is fake and is totally created by myself. The data is created only for sake of understanding and discussion.

Hi there, as you know I am learning Python for the development and deployment of Machine Learning and Artificial Intelligence. Currently, I completed the module of python and four essential libraries for Machine Learning for importing data, data cleaning, data preprocessing, data visualization and statistical modeling those are nothing but the Numpy, Pandas, Matplotlib & Seaborn. In today's article, I am going to discuss the library which I used for data plotting and data visualization that library is "Matplotlib". The library allow us to visualize data and we can also plot data from our own created data frames & nd arrays.

In my previous article, I discussed Pandas, its important functions and features such as how to create Series, Data-frames, accessing the data from the rows and columns of the Data frames, indexing of Series and Data-frames, adding a new column in data-frame and calculating the mean, mode, average, and other mathematical functions used on numerical data inside data-frames, and also to import the huge datasets into our python environment. The article under the title "Another Step Towards Machine Learning: Pandas" you can see here:

So, here we go today I'll discuss the essential data plotting and data visualization library which is "Matplotlib". For installing the library we have to jump to our command prompt in Windows or shell in case of Mac. For using Matplotlib in python environment first of all we have to install the library by typing the command "pip install Matplotlib" in command prompt (Required Internet Connection). In the case of Anaconda IDE, all packages related to Machine Learning & Data Science comes inbuilt inside the IDE. For using the libraries firstly we import some essential libraries as shown:

```>>> import numpy as np

>>> import pandas as pd

>>> from matplotlib import pyplot as plt

```

Let us consider four lists:

```>>> year = [2015,2016,2017,2018,2019]

>>> Polo = [234,56,499,24,24]

>>> Vento = [763,82,381,37,183]

>>> Mustang = [34,242,245,24,532]

```

Let us convert these lists into NumPy nd-arrays

```>>> a1 = np.array(year)

>>> a2 = np.array(Polo)

>>> a3 = np.array(Vento)

>>> a4 = np.array(Mustang)

```

Lets us check these nd-arrays:

```>>> a1

array([2015, 2016, 2017, 2018, 2019])

>>> a2

array([234,  56, 499,  24,  24])

>>> a3

array([763,  82, 381,  37, 183])

>>> a4

array([ 34, 242, 245,  24, 532])

```

Now the case is we have years and number of units of three cars brands sold in each year which are the Polo, Vento & Mustang. Let us plot year vs the sale of each car.

```>>> plt.plot(a1,a2,color='red')

>>> plt.xlabel('Year')

>>> plt.ylabel('Units Sold')

>>> plt.title('POLO CAR SALE')

```

```>>> plt.plot(a1,a3,color='green')

>>> plt.xlabel('Year')

>>> plt.ylabel('Units Sold')

>>> plt.title('VENTO CAR SALE')

```

```>>> plt.plot(a1,a4,color='orange')

>>> plt.xlabel('Year')

>>> plt.ylabel('Units Sold')

>>> plt.title('MUSTANG CAR SALE')

```

The plot() function allows us to plot the numbers or values by taking several arguments and also a argument which used for color of line we want to give. The xlabel(), ylabel() and title() functions takes the arguments as a string which dispalyed on the x-axis and y-axis and the title of the graph plot.

To plot the sale of three brands on a single plot use the legend for difference:

```>>> plt.plot(a1,a2,label='POLO')

>>> plt.plot(a1,a3,label='VENTO')

>>> plt.plot(a1,a4,label='MUSTANG')

>>> plt.xlabel('Year')

>>> plt.ylabel('Units Sold')

>>> plt.title('ALL BRANDS SALE')

>>> plt.legend()

```

The legend() function is used to provide legend when more then one plot is present in the axes to differentiate among plots under the label argument inside plot function in this case the colours are chosen automatically by the IDE.

```# FIRST SUBPLOT

>>> plt.subplot(1,3,1)

>>> plt.plot(a1,a2,color='red')

>>> plt.xlabel('Year')

>>> plt.ylabel('Sale')

>>> plt.title('POLO')

# SECOND SUBPLOT

>>> plt.subplot(1,3,2)

>>> plt.plot(a1,a3,color='green')

>>> plt.xlabel('Year')

>>> plt.ylabel('Sale')

>>> plt.title('VENTO')

# THIRD SUBPLOT

>>> plt.subplot(1,3,3)

>>> plt.plot(a1,a4,color='blue')

>>> plt.xlabel('Year')

>>> plt.ylabel('Sale')

>>> plt.title('MUSTANG')

>>> plt.tight_layout()

```

The subplot() function is used to plot different plots on different axes here in this case the subplot is created of one row and three columns. The subplot() function takes three arguments first is for number of rows, second for number of columns and third or plot number and on which we plotted the sales of three different cars.

The tight_layout() function is used to remove the tightness between the subplots.

```# lw = linewidth

# ls = linestyle

# ms = markersize

# mfc = markerfacecolor

# mec = markeredgecolor

# mew = markeredgewidth

>>> plt.plot(a1,a2,label='POLO',lw=5,ls=':',marker='o',ms=10,mfc='black',mec='red',mew=2.0)

>>> plt.plot(a1,a3,label='VENTO',lw=5,ls='-',marker='o',ms=10,mfc='black',mec='red',mew=2.0)

>>>
plt.plot(a1,a4,label='MUSTANG',lw=5,ls='--',marker='o',ms=10,mfc='black',mec='red',mew=2.0)

>>> plt.xlabel('Year')

>>> plt.ylabel('Units Sold')

>>> plt.title('ALL BRANDS SALE')

>>> plt.legend()

```

While using plt.plot() press shift+tab for get to know all the arguments about the line and what type of values they take.

In this article I discussed some important features and functions of Matplotlib library but there are many functions in Matplotlib which are about to explore by me.

This is all about this article of learning python for Machine Learning and Artificial Intelligence as soon I learn different concepts of python I'll keep posting. So, there are more to go as I am learning.

Bye-Bye, See You in my next article. Until then enjoy Machine Learning and PEACE OUT.

2 views

See All