#1 Salary Prediction Project in Linear Regression using Machine Learning!
In this Article, I have shown how No. of Years affects the Amount of Salary received. I will train the machine to predict the salary based on no. of the years
Let’s Get Started!
You can Get the Dataset by clicking on my github Profile :- Salary.csv
Importing necessary Libraries :-
Import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")
Importing the Dataset
Salary = pd.read_csv('Salary.csv')
Salary.head() #Getting the First 5 Values of the Dataset.
Salary.info() <class 'pandas.core.frame.DataFrame'>
RangeIndex: 35 entries, 0 to 34
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 YearsExperience 35 non-null float64
1 Salary 35 non-null int64
dtypes: float64(1), int64(1)
memory usage: 688.0 bytes
Performing the Basic EDA
Salary.describe()
Looking for the Null Values
Salary.isnull().any() YearsExperience False #Its False for both Columns
Salary False
dtype: boolSalary.isnull().sum() #It was false for both columns, still looking for sum of them null values, it should be 0 YearsExperience 0
Salary 0
dtype: int64
Plotting the Visualization of Linear Regression
plt.plot(Salary['Salary'], Salary['YearsExperience'])
Building the Model
Step — 1 Extracting the X and Y variables
X = Salary.drop('Salary', axis = 1)
#Dropping the salary column tp get X = YearsExperience in a dataframeAnd
Y = Salary['Salary']
Step — 2 Splitting the Data set
In Machine Learning, we never talk 100% data to apply our model, We use the 80–20 rule. In which we split the data in 80 — 20 format , 80 % is used to train and 20% is used to test. ML model randomly take 80% — 20% data and then we get predictions based on testing data.
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X , Y, test_size = 0.2 , random_state = 40)
You can see the Xtrain, Xtest, Ytrain, Ytest individually also now
You can refer to Jupyter Notebook in github here to view the same.
Step — 3 Using Sklearn Library to import Linear regression Predictions
from sklearn.linear_model import LinearRegressionLR = LinearRegression()
Step — 4 Fitting our trainnig dataset
LR.fit(Xtrain,Ytrain)
Output = LinearRegression()
Step — 5 Predicting out Testing Dataset based on training
Y_pred = LR.predict(Xtest)
Step — 6 Getting the Scores of our Model
How accurate our Testing dataset is performing based on Training Dataset.
print(LR.score(Xtest, Ytest))
Output :- 0.9765059258552102
#Here it means the we have 97% our accuracy in our model
Now our Model is trained and Build Successfully
We can enter any no. of Years of experience and our model will tell us the predicted the salary
For example:
YearsExperience = 5
own_pred = LR.predict([[5]])
print("No of Experience = {}".format(YearsExperience))
print("Predicted Salary = {}".format(own_pred[0]))Output =
No of Experience = 5
Predicted Salary = 72955.24706342639
Here, I am signing off with my predictions.
Keep Following for more Content.
Regards
Kartik Aggarwal
Do refer to my Profiles for the Further Content.!!
Github : KartikAggarwal1305 (Kartik Aggarwal) (github.com)
Tableau : Profile — kartik.aggarwal6547 | Tableau Public
Linkedin : Kartik Aggarwal | LinkedIn
Medium : Kartik Aggarwal — Medium