#1 Salary Prediction Project in Linear Regression using Machine Learning!

3 min readOct 14, 2022

In this Article, I have shown how No. of Years affects the Amount of Salary received. I will train the machine to predict the salary based on no. of the years
Let’s Get Started!

You can Get the Dataset by clicking on my github Profile :- Salary.csv

Importing necessary Libraries :-

Import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression 
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")

Importing the Dataset

Salary = pd.read_csv('Salary.csv')
Salary.head() #Getting the First 5 Values of the Dataset.

Salary.head() shows first 5 values by Default.

Salary.info() <class 'pandas.core.frame.DataFrame'>
RangeIndex: 35 entries, 0 to 34
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   YearsExperience  35 non-null     float64
 1   Salary           35 non-null     int64  
dtypes: float64(1), int64(1)
memory usage: 688.0 bytes

Performing the Basic EDA

Salary.describe()

Looking for the Null Values

Salary.isnull().any() YearsExperience    False  #Its False for both Columns
Salary             False
dtype: boolSalary.isnull().sum() #It was false for both columns, still looking for sum of them null values, it should be 0 YearsExperience    0
Salary             0
dtype: int64

Plotting the Visualization of Linear Regression

plt.plot(Salary['Salary'], Salary['YearsExperience'])

Linear Regression of Salary with Experience

Building the Model

Step — 1 Extracting the X and Y variables

X = Salary.drop('Salary', axis = 1) 
#Dropping the salary column tp get X = YearsExperience in a dataframeAnd
Y = Salary['Salary']

Step — 2 Splitting the Data set

In Machine Learning, we never talk 100% data to apply our model, We use the 80–20 rule. In which we split the data in 80 — 20 format , 80 % is used to train and 20% is used to test. ML model randomly take 80% — 20% data and then we get predictions based on testing data.

Xtrain, Xtest, Ytrain, Ytest = train_test_split(X , Y, test_size = 0.2 , random_state = 40)

You can see the Xtrain, Xtest, Ytrain, Ytest individually also now

You can refer to Jupyter Notebook in github here to view the same.

Step — 3 Using Sklearn Library to import Linear regression Predictions

from sklearn.linear_model import LinearRegressionLR = LinearRegression()

Step — 4 Fitting our trainnig dataset

LR.fit(Xtrain,Ytrain)
Output = LinearRegression()

Step — 5 Predicting out Testing Dataset based on training

Y_pred = LR.predict(Xtest)

Step — 6 Getting the Scores of our Model

How accurate our Testing dataset is performing based on Training Dataset.

print(LR.score(Xtest, Ytest))
Output :- 0.9765059258552102
#Here it means the we have 97% our accuracy in our model

Now our Model is trained and Build Successfully

We can enter any no. of Years of experience and our model will tell us the predicted the salary
For example:

YearsExperience = 5
own_pred = LR.predict([[5]])
print("No of Experience = {}".format(YearsExperience))
print("Predicted Salary = {}".format(own_pred[0]))Output = 
No of Experience = 5
Predicted Salary = 72955.24706342639

Here, I am signing off with my predictions.

Keep Following for more Content.

Regards
Kartik Aggarwal
Do refer to my Profiles for the Further Content.!!
Github : KartikAggarwal1305 (Kartik Aggarwal) (github.com)
Tableau : Profile — kartik.aggarwal6547 | Tableau Public
Linkedin : Kartik Aggarwal | LinkedIn
Medium : Kartik Aggarwal — Medium