Exploratory Data Analysis in Python — Part 2(Advanced)

Kartik Aggarwal
3 min readSep 21, 2022

--

Hey Learners, This is the Part — 2 of Exploratory Data Analysis. In part-1 you have seen the theoretical part of EDA and now, this article will go through the graphs and visualization which are used in EDA.

This article is based SEABORN on Automobile data set, You can get the dataset Automobile From here.

Let’s Get Started

When we deal with data, we often look into how variables are distributed.

Importing the Libraries

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
warnings = 'ignore'

Importing the Data Set

Automobile = pd.read_csv('Automobile.csv')
Automobile.head()
This is how data set look, We can see Data is containing 5 rows and 26 columns.

Moving to the visualization

#1 Which are the most sold Body-style’s in the cars?

sns.countplot(Automobile[‘body_style’] #We are using Seaborn
The Bar-plot give the right visualization

Sedan is having the highest Count in the Dataset

#2 Which type of car has the highest Horsepower and where will be its engine?

We need to use two columns i.e. Body style and horsepower to compare the both and to find the solution

sns.barplot(Automobile[‘body_style’], Automobile[‘horsepower’], hue=Automobile[‘engine_location’])
The bar-plot compares the calculations for us

Convertible Type cars having rare engine has the highest horsepower

#3 What is the average MPG(Miles per Gallon) of cars in cities ?

sns.displot(Automobile['city_mpg'])
plt.show()
The Bar-plot shows the best Representation.

The highest no. of the MPG is around 25, so the maximum cars are around 25 MP/G

#4 What is the relation of Horsepower and Engine_size?

x = Automobile['engine_size']
y = Automobile['horsepower']
sns.jointplot(x,y)
This regression show the relation b/w the horsepower and Engine size

We can see as the horsepower is increasing , engine size is also increasing.

#5 What is the relation Between Normalized losses, engine size and horsepower?

sns.pairplot(Automobile[[‘normalized_losses’,’engine_size’, ‘horsepower’]])
Here , each graph shows the regression between the other variable

#6 What is the fuel types of all the engine Size?

sns.stripplot(Automobile[‘fuel_type’], Automobile[‘engine_size’])
The strip plot compares two categorical Variables.

Maximum are Gas.

#7 Which fuel type car has more horsepower and what will be its no. of doors?

sns.boxplot(Automobile['number_of_doors'], Automobile['horsepower'], hue=Automobile['fuel_type'])
Box Plot gives us the correct Visualization for this.

We can see cars with 2 doors and gas type has the more horsepower than others.

#8 What is the correlation of the Data

Automobile.corr()
correlation of the Whole Dataset

Representing the correlation in Heatmap

sns.heatmap(Automobile.corr())
Heatmap for the correlation

To revise all the theoretical part you all can refer to my Exploratory Data Analysis in Python — Part1 (Basic)

Do refer to my Profiles for the Further Content.!!

Github : KartikAggarwal1305 (Kartik Aggarwal) (github.com)

Linkedin : Kartik Aggarwal | LinkedIn

Medium : Kartik Aggarwal — Medium

--

--

Kartik Aggarwal
Kartik Aggarwal

Written by Kartik Aggarwal

I am a passionate data and business analyst from Christ University. I am well certified in concepts such as Excel, Power Bi, Tableau, SQL and Python.

No responses yet