Hello there, I am back with another article as you know I am learning python for the development and deployment of Artificial Intelligence and Machine Learning. Till now I completed the module of python programming language and essential mathematics for Machine Learning. In the next module, the actual journey of Machine Learning will be starting I am going to learn the Machine Learning from scratch. Before the completion of the module of mathematics, I completed the part of data analysis and data visualization the available libraries for data analysis are NumPy & Pandas and for data visualization, there are many libraries are available such as Matplotlib, Seaborn for statistical modeling, pandas built-in visualization, and plotly & cufflinks.
In my previous article, I discussed some basic and most used functions of the library. you can check out that article here:
In today's era after analyzing the data, it is important to visualize that data as a saying that "Data Is More Understandable When Visualized" there are some limitations in Matplotlib which later on overcome by the seaborn library. So, without any further delay let's get started.
# importing the essential libraries >>> import numpy as np >>> import pandas as pd >>> from matplotlib.pyplot as plt >>> %matplotlib inline >>> import seaborn as sns
Seaborn comes with some builtin datasets with itself.
>>> sns.get_dataset_names() ['anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'exercise', 'flights', 'fmri', 'gammas', 'iris', 'mpg', 'planets', 'tips', 'titanic'] >>> len(sns.get_dataset_names()) 15
There are 15 built-in datasets in seaborn library which can be used for plotting data and for statistical modeling.
This function combines the matplotlib's histogram function with the seaborn kdeplot function.
>>> x = np.random.randint(10000000) >>> sns.distplot(x, color = 'green', kde =False)
Fit and plot a univariate or bivariate kernel density estimate.
More the values for standard normal distribution in x the more perfect bell shaper of curve we obtain. The shape of curve depends on number and type of values which are continuous.
Let's take a look at tips dataset
>>> tips = sns.load_dataset('tips') >>> tips.head() total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 25.59 3.61 Female No Sun Dinner 4
Let's plot a dist plot and a kde plot for a tips dataset total_bill column.
Show point estimates and confidence intervals as rectangular bars. A bar plot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars.
>>> sns.barplot(x = 'day', y = 'total_bill', data = tips, hue = 'sex')
Draws a box plot to show distributions with respect to categories. A box plot or whisker plot shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be "outliers" using a method that is a function of the inter-quartile range.
>>> sns.boxplot(x = 'day', y = 'total_bill', data = tips, hue = 'sex')
Draw a scatter plot with possibility of several semantic groupings. The relationship between x and y can be shown for different subsets of the data using the hue and size parameters. These parameters control what visual semantics are used to identify the different subsets. It is possible to show up to three dimensions independently by using all three semantic types, but this style of plot can be hard to interpret and is often ineffective.
>>> sns.scatterplot(x = 'total_bill' ,y = 'tip', hue = 'sex' ,data = tips)
Let's load flights dataset:
>>> flights = sns.load_dataset('flights') >>> flights.head() year month passengers 0 1949 January 112 1 1949 February 118 2 1949 March 132 3 1949 April 129 4 1949 May 121
Let's create a pivot table of flights dataset
>>> fp = flights.pivot_table(index= 'month',columns= 'year',values= 'passengers')
Plot rectangular data as a color-encoded matrix. This is an Axes-level function and will draw the heatmap into the currently-active Axes if none is provided to the ``ax`` argument.
Plot a matrix dataset as a hierarchically-clustered heatmap.
Two-dimensional, size-mutable, potentially heterogeneous tabular data. Data structure also contains labeled axes (rows and columns). Arithmetic operations align on both row and column labels. The line is known as the regression line.
>>> sns.lmplot(x = 'tip', y = 'total_bill', data = tips, hue = 'sex')
In this article I discussed some important features and functions of Seaborn library but there are many functions in Seaborn which are about to explore. This is all about my this article of learning python for Machine Learning and Artificial Intelligence as soon I learn different concepts of python I'll keep posting. So, there are more to go as I am learning. Bye-Bye, See You in my next article. Until then Happy Learning & enjoy Machine Learning and PEACE OUT ✌✌✌.