Updated: Apr 25
Data analysis is about understanding and interpreting the loads and loads of data that is being collected worldwide to have a better view of worldly patterns followed in different domains like geology, biology, society, physics, etc. This data can be of any field, region, and, etc.
We need data analysis to be conscious and responsive to the trends of our planet's environment. For example, think how much help it would be if we come to know that after a period a specific region faces earthquake or volcanic eruptions. We can save lives, food, places, and much more.
Data visualization can simply be understood as viewing our data in a much interactive manner, for instance, using graphs, charts, or any other graphical representation. When you have loads and loads of data, reading each file is not only hectic but also a little foolish.
The above picture is simply an example of data visualization from the "Tableau tool". This is the data of the company's sales and profit in the European continent.
But our python...provides a different set of tools and libraries to visualize our data interactively. NumPy, Pandas, SciPy, Matplotlib, Seaborn, and, Dash are some quite good libraries provided by python to us. I already discussed NumPy in my other Blog.
Now, let's start the data visualization part:
We are going to import libraries as import pandas as pd, import numpy as np, and import matplotlib.pyplot as plt.
You can see, matplotlib.pyplot is used to display this graph. This library provides us special features that work like MatLab.
plt.plot() is for plotting the graph.
plt.grid() plotting the graph in a grid.
plt.legend() is to plot the labels "label= 'BMW' and label= 'Mercedes'".
plt.xlabel() and plt.ylabel to label the x-axis and y-axis respectively.
You can have a good look at the image code to see other functionalities used for plotting the graph.
Now, if you want to plot these graphs separately then they can be done with simple code like the following image. This is called sub-plotting.
These graphs can be viewed in the form of bar-graph with this code as:
These data can be viewed in many forms like a pie chart as:
1st in code is for keeping these pies at a specific distance to give an effect of a piece cut from the pie. "autopct" is for the auto percentage to show how much these pieces cover in the whole pie. Again, plt.pie() is the actual plotting function that is shown by plt.show().
All of this work was done with matplotlib. But now, we are going to discover a few aspects of Seaborn.
Seaborn is a graphics library that provides us with statistical visualization of our data and behaves as an extension to matplotlib. Data visualization is all about colors, shapes, figures, etc. Matplotlib is restricted to graphs, charts, etc. Seaborn, on the other hand, uses various figures for visual illustration.
The code in the above picture is just a basic code but see the results. We loaded dataset "tips" from the python library. And filtered it to know the statistics of tips and if the customer was smoker or not. Here, I have another view. You can see how scattering is done in the image giving more insight into the value and time of the tips given.
And this will go on forever. The more the data are the more the ways to visualize it. Because visualization depicts every aspect with ease to understand it has become a necessary skill if you want to be a data analyst or scientist. This wasn't all that we learned these past few days. But to summarize that in an article is quite a tough task. I shall be soon on the platform with my latest blog in "Let's talk Python".