Some Special Plots: Seaborn by Hrithik Sharma

Hello there, I am back with another article as you know I am learning python for the development and deployment of Artificial Intelligence and Machine Learning. Till now I completed the module of python programming language and essential mathematics for Machine Learning. In the next module, the actual journey of Machine Learning will be starting I am going to learn the Machine Learning from scratch. Before the completion of the module of mathematics, I completed the part of data analysis and data visualization the available libraries for data analysis are NumPy & Pandas and for data visualization, there are many libraries are available such as Matplotlib, Seaborn for statistical modeling, pandas built-in visualization, and plotly & cufflinks.

In my previous article, I discussed some basic and most used functions of the library. you can check out that article here:

In today's era after analyzing the data, it is important to visualize that data as a saying that "Data Is More Understandable When Visualized" there are some limitations in Matplotlib which later on overcome by the seaborn library. So, without any further delay let's get started.

# importing the essential libraries

>>> import numpy as np

>>> import pandas as pd

>>> from matplotlib.pyplot as plt

>>> %matplotlib inline

>>> import seaborn as sns

Seaborn comes with some builtin datasets with itself.

>>> sns.get_dataset_names()

>>> len(sns.get_dataset_names())


There are 15 built-in datasets in seaborn library which can be used for plotting data and for statistical modeling.

Distribution Plots

Dist Plot:

This function combines the matplotlib's histogram function with the seaborn kdeplot function.

>>>  x = np.random.randint(10000000)

>>>  sns.distplot(x, color = 'green', kde =False)

KDE Plot:

Fit and plot a univariate or bivariate kernel density estimate.

>>> sns.kdeplot(x,color='black')

More the values for standard normal distribution in x the more perfect bell shaper of curve we obtain. The shape of curve depends on number and type of values which are continuous.

Let's take a look at tips dataset

>>> tips = sns.load_dataset('tips')

>>> tips.head()

        total_bill    tip      sex   smoker   day     time   size
    0        16.99   1.01   Female       No   Sun   Dinner      2
    1        10.34   1.66     Male       No   Sun   Dinner      3
    2        21.01   3.50     Male       No   Sun   Dinner      3
    3        23.68   3.31     Male       No   Sun   Dinner      2
    4        25.59   3.61   Female       No   Sun   Dinner      4

Let's plot a dist plot and a kde plot for a tips dataset total_bill column.

>>> sns.distplot(tips['total_bill'])

>>> sns.kdeplot(tips['total_bill'])

Categorical Plots

Bar Plot:

Show point estimates and confidence intervals as rectangular bars. A bar plot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars.

>>> sns.barplot(x = 'day', y = 'total_bill', data = tips, hue = 'sex')

Box Plot:

Draws a box plot to show distributions with respect to categories. A box plot or whisker plot shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be "outliers" using a method that is a function of the inter-quartile range.

>>> sns.boxplot(x = 'day', y = 'total_bill', data = tips, hue = 'sex')

Scatter Plot:

Draw a scatter plot with possibility of several semantic groupings. The relationship between x and y can be shown for different subsets of the data using the hue and size parameters. These parameters control what visual semantics are used to identify the different subsets. It is possible to show up to three dimensions independently by using all three semantic types, but this style of plot can be hard to interpret and is often ineffective.

>>> sns.scatterplot(x = 'total_bill' ,y = 'tip', hue = 'sex' ,data = tips)

Matrix Plots

Let's load flights dataset:

>>> flights = sns.load_dataset('flights')

>>> flights.head()

         year       month   passengers     

    0    1949     January          112
    1    1949    February          118
    2    1949       March          132
    3    1949       April          129
    4    1949         May          121

Let's create a pivot table of flights dataset

>>> fp = flights.pivot_table(index= 'month',columns= 'year',values= 'passengers')  

Heat Map:

Plot rectangular data as a color-encoded matrix. This is an Axes-level function and will draw the heatmap into the currently-active Axes if none is provided to the ``ax`` argument. 

>>> sns.heatmap(fp)

Cluster Map:

Plot a matrix dataset as a hierarchically-clustered heatmap.

>>> sns.clustermap(fp)

Regression Plot

lm Plot:

Two-dimensional, size-mutable, potentially heterogeneous tabular data. Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. The line is known as the regression line.

>>> sns.lmplot(x = 'tip', y = 'total_bill', data = tips, hue = 'sex')

In this article I discussed some important features and functions of Seaborn library but there are many functions in Seaborn which are about to explore. This is all about my this article of learning python for Machine Learning and Artificial Intelligence as soon I learn different concepts of python I'll keep posting. So, there are more to go as I am learning. Bye-Bye, See You in my next article. Until then Happy Learning & enjoy Machine Learning and PEACE OUT ✌✌✌.

Contact Us:

A-116, The Corenthum A-65,

Sector 62

Noida 201301

UP, India

Phone: +91 8882050481


© 2020 TheIkigaiLab  Terms of Use Privacy and Security Statement