#1Data Cleaning Using Python in VS Code : Detecting Missing Values

Kartik Aggarwal
3 min readMay 20, 2022

--

Introduction

When it comes to data sets, nothing is ever ideal. In this post We’ll go over a few different data cleaning Tasks using Python’s Pandas Library. We’ll concentrate on missing values, which is perhaps the most difficult data cleaning process.

Missing data is commonly dealt with in one of three ways: (a) ignore it, (b) drop records with missing data, or (c) fill it up. Using a Cars dataset(2. Cars Data1 — Google Sheets) including missing data, we will fill up the missing data with the Pandas library in this post.

Lets Get Started :-

First of all we need to insert the Pandas Library by using the the command as shown below

Importing the Library as well as the Dataset

In VS Code, We can directly Drag and drop the CSV file, by doing this it will help us to directly insert the command of pd.read_csv(‘the file name’) in VS code as shown above, It help us to save the time of inserting the location of the file and then the name.

Now you can access the File :- Using the Name Allotted to the File i.e. car

I have given the file name Car and file is Accessed

Now let’s Find out the Null values in the Dataset Using the Following Command :- car.isnull()

Command is applied to the Dataset

The representation of the Dataset will be in the form of True or False , True represents that the value is Null and False Represents that the Value is filled. We will be dealing with the BigData in Future so its hard to depict the null values by directly looking for null values with respective columns.

So We can use the sum Function to execute the sum of the null values with there respective columns.

We can See that there are 2 Null values in Cylinders Column.

Now , We have the Null values but we can’t just put any values in Cylinders because it may seem too high for or too low. Therefore, This brings us to the next method of filling missing values with mean (or median) within a specific group i.e. Cylinders .

We can fill the missing values using group level statistics in the following manner.

Lets look at the output now

As Now we can see there are no more Null values in Cylinders

Finally we have filled the missing values based on the given conditions in pandas. The choice of the filling method depends on the assumptions and the context of the problem. For example, filling the missing values of a horsepower , we cannot just fill the data on our own, there should be some reason to it.

Conclusion

In this article we examined the following methods for filling missing values using Pandas.

  • Fillna
  • mean
  • sum

We also see how to use other methods in conjunction with pandas to fill missing values and other data cleaning strategies for different Datasets

Stay Tuned

Kartik Aggarwal

(21) Kartik Aggarwal | LinkedIn

--

--

Kartik Aggarwal
Kartik Aggarwal

Written by Kartik Aggarwal

I am a passionate data and business analyst from Christ University. I am well certified in concepts such as Excel, Power Bi, Tableau, SQL and Python.

Responses (1)