#1 Salary Prediction Project in Linear Regression using Machine Learning!

Kartik Aggarwal
3 min readOct 14, 2022

--

In this Article, I have shown how No. of Years affects the Amount of Salary received. I will train the machine to predict the salary based on no. of the years
Let’s Get Started!

You can Get the Dataset by clicking on my github Profile :- Salary.csv

Importing necessary Libraries :-

Import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")

Importing the Dataset

Salary = pd.read_csv('Salary.csv')
Salary.head() #Getting the First 5 Values of the Dataset.
Salary.head() shows first 5 values by Default.
Salary.info() <class 'pandas.core.frame.DataFrame'>
RangeIndex: 35 entries, 0 to 34
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 YearsExperience 35 non-null float64
1 Salary 35 non-null int64
dtypes: float64(1), int64(1)
memory usage: 688.0 bytes

Performing the Basic EDA

Salary.describe()
Statistics of the Data

Looking for the Null Values

Salary.isnull().any() YearsExperience    False  #Its False for both Columns
Salary False
dtype: bool
Salary.isnull().sum() #It was false for both columns, still looking for sum of them null values, it should be 0 YearsExperience 0
Salary 0
dtype: int64

Plotting the Visualization of Linear Regression

plt.plot(Salary['Salary'], Salary['YearsExperience'])
Linear Regression of Salary with Experience

Building the Model

Step — 1 Extracting the X and Y variables

X = Salary.drop('Salary', axis = 1) 
#Dropping the salary column tp get X = YearsExperience in a dataframe
And
Y = Salary['Salary']

Step — 2 Splitting the Data set

In Machine Learning, we never talk 100% data to apply our model, We use the 80–20 rule. In which we split the data in 80 — 20 format , 80 % is used to train and 20% is used to test. ML model randomly take 80% — 20% data and then we get predictions based on testing data.

Xtrain, Xtest, Ytrain, Ytest = train_test_split(X , Y, test_size = 0.2 , random_state = 40)

You can see the Xtrain, Xtest, Ytrain, Ytest individually also now

You can refer to Jupyter Notebook in github here to view the same.

Step — 3 Using Sklearn Library to import Linear regression Predictions

from sklearn.linear_model import LinearRegressionLR = LinearRegression()

Step — 4 Fitting our trainnig dataset

LR.fit(Xtrain,Ytrain)
Output = LinearRegression()

Step — 5 Predicting out Testing Dataset based on training

Y_pred = LR.predict(Xtest)

Step — 6 Getting the Scores of our Model

How accurate our Testing dataset is performing based on Training Dataset.

print(LR.score(Xtest, Ytest))
Output :- 0.9765059258552102
#Here it means the we have 97% our accuracy in our model

Now our Model is trained and Build Successfully

We can enter any no. of Years of experience and our model will tell us the predicted the salary
For example:

YearsExperience = 5
own_pred = LR.predict([[5]])
print("No of Experience = {}".format(YearsExperience))
print("Predicted Salary = {}".format(own_pred[0]))
Output =
No of Experience = 5
Predicted Salary = 72955.24706342639

Here, I am signing off with my predictions.

Keep Following for more Content.

Regards

Kartik Aggarwal

Do refer to my Profiles for the Further Content.!!

Github : KartikAggarwal1305 (Kartik Aggarwal) (github.com)

Tableau : Profile — kartik.aggarwal6547 | Tableau Public

Linkedin : Kartik Aggarwal | LinkedIn

Medium : Kartik Aggarwal — Medium

--

--

Kartik Aggarwal
Kartik Aggarwal

Written by Kartik Aggarwal

I am a passionate data and business analyst from Christ University. I am well certified in concepts such as Excel, Power Bi, Tableau, SQL and Python.

Responses (1)