#1Data Cleaning Using Python in VS Code : Detecting Missing Values
Introduction
When it comes to data sets, nothing is ever ideal. In this post We’ll go over a few different data cleaning Tasks using Python’s Pandas Library. We’ll concentrate on missing values, which is perhaps the most difficult data cleaning process.
Missing data is commonly dealt with in one of three ways: (a) ignore it, (b) drop records with missing data, or (c) fill it up. Using a Cars dataset(2. Cars Data1 — Google Sheets) including missing data, we will fill up the missing data with the Pandas library in this post.
Lets Get Started :-
First of all we need to insert the Pandas Library by using the the command as shown below
In VS Code, We can directly Drag and drop the CSV file, by doing this it will help us to directly insert the command of pd.read_csv(‘the file name’) in VS code as shown above, It help us to save the time of inserting the location of the file and then the name.
Now you can access the File :- Using the Name Allotted to the File i.e. car
Now let’s Find out the Null values in the Dataset Using the Following Command :- car.isnull()
The representation of the Dataset will be in the form of True or False , True represents that the value is Null and False Represents that the Value is filled. We will be dealing with the BigData in Future so its hard to depict the null values by directly looking for null values with respective columns.
So We can use the sum Function to execute the sum of the null values with there respective columns.
Now , We have the Null values but we can’t just put any values in Cylinders because it may seem too high for or too low. Therefore, This brings us to the next method of filling missing values with mean (or median) within a specific group i.e. Cylinders .
We can fill the missing values using group level statistics in the following manner.
Lets look at the output now
Finally we have filled the missing values based on the given conditions in pandas. The choice of the filling method depends on the assumptions and the context of the problem. For example, filling the missing values of a horsepower , we cannot just fill the data on our own, there should be some reason to it.
Conclusion
In this article we examined the following methods for filling missing values using Pandas.
- Fillna
- mean
- sum
We also see how to use other methods in conjunction with pandas to fill missing values and other data cleaning strategies for different Datasets
Stay Tuned
Kartik Aggarwal