Skip to Content
Technical Articles
Author's profile photo Priyanka Sadana

Linear Regression in Machine Learning

This blog will explain Linear Regression algorithm, a way to achieve Data modeling (fourth step in CRISP-DM model)

CRISP-DM: Cross Industry Standard Process for Data Mining provides a structured approach to planning a data mining project. This model is an idealized sequence of below mentioned events:

  1. Business Understanding
  2. Data Understanding
  3. Data Preparation
  4. Data Modeling
  5. Model Evaluation
  6. Model Deployment

Data Modeling uses machine learning algorithms, in which machine learns from the data. It is like the way humans learn from their experience.

Machine Learning models are classified in two categories:

  1. Supervised learning method: This method has historical data with labels. Regression and Classification algorithms fall under this category.
  2. Unsupervised learning methods: No pre-defined labels are assigned to historical data. Clustering algorithms fall under this category.

For example, predicting the performance of a company in terms of revenue based on history data is a regression problem and classifying if a person is likely to default loan or not is a classification problem.

How regression works?

Let’s consider an example, a company could predict it sales based on the money they put in advertising.

Previous data of spending on advertising and actual sales

Advertising expenditure (in thousands) Sales (in lakhs)
20 11
30 23
11 6
14 7
45 44.4


You would like to know if you are spending X amount in advertising then what would be your sales.

Always remember that domain expertise helps in finding the right prediction results. Also, the domain expertise of the company’s advertising team can give a rough idea on the effect of change in advertising expenditure on change in sales. But to find exactly what amount of sales would get generated and to know whether a relationship between advertising expenditure and sales exists or not; you can use regression algorithm to build a model and to do a prediction.

Let’s try to plot a graph of Advertising Expenditure versus Sales

Independent Variable: Variable on X-axis which is used for prediction is independent variable.

Dependent Variable: Variable on Y-axis which we want to predict is a dependent variable.

Equation of a straight-line y = mx + c, where m is the slope of the line and c is the intercept.

What is the significance of m and c in the equation of a straight?

‘m’ signifies the strength of the relation between X and Y.

‘c’ in above example means the amount of Sales when no money is spent on Advertising that is when X = 0.

Best Fit line: The line that best fits the scatter plot. What does best fit means and how to determine whether a line is best fit or not?

Residual: Residual is used to find the best fit line. Every data point has a residual value which is the difference between the actual value and the predicted value (the value of point on line). Let’s denote this by E(error)

E = Actual – Predicted (for every data point)

Minimize the total error square i.e. minimize e12 + e22 + …… + en2.

This is also called as Residual Sum of Squares (RSS). So, choose the value of m and c in such a way that it reduces the value of RSS.

Let’s write E in terms of m and c.

E = ei = yi (actual) – ypred

ei = yi – mxi – c

In Machine Learning models, a cost function is defined for a problem and then it is either minimized or maximized according to the requirement. In case of the above described regression the cost function in Residual Sum of Squares.

How to minimize a cost function?

  • Differentiate the cost function and put it equal to zero.
  • Gradient Descent; start with some value of ‘m’ and ‘c’ and then iteratively move to better ‘m’ and ‘c’ to minimize the cost function.

RSS is an absolute quantity and hence, in the data set if the unit gets changed then the value of RSS will also change. There exists another measure TSS which is relative and not absolute. TSS is Total Sum of Square.

How to calculate TSS?

TSS is the sum of square of difference of each data point from the mean value of all the values of target variable (y).

TSS = (Y1 – Ymean)2 + (Y2 – Ymean)2 + ……. (Yn – Ymean)2

Here, the line is with intercept (‘c’ in y = mx + c) equal to Ymean; it means that this line does not include any influence of independent variable. This is a very basic model and therefore, any model that is build using independent variable should be better than the basic model.

RSS/TSS is a normalized quantity.

R2 = 1 – RSS/TSS

Higher the value of R2 explains how good a model is.

Let’s say the value of R2 is 0.87; it means that 87% of the variance can be explained in the data.

If the predicted line can explain each data point correctly then the difference between actual and predicted is 0 which means that RSS is 0 and hence, R2 is 1.


Next topic will cover using Linear Regression via python.

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Douglas Cezar Kuchler
      Douglas Cezar Kuchler

      It is an amazing post. It is a sign of changing times to have it on the SAP blogsphere.


      You have written a very detailed and well-explained thought process here, congratulations on that.


      When we see the current surge in machine learning use we see it is based on statical thinking and models that exist for a long time. What is new is having programming languages (like Python and R) and enough computer power and internet bandwidth to apply it to really huge datasets. Do you agree with this statement?



      Best regards,


      Author's profile photo Priyanka Sadana
      Priyanka Sadana
      Blog Post Author

      Hi Douglas,

      Thank you!


      The statement that you have mentioned is in a way correct because statistical thinking and algorithms to build model have been there for past many years and are now coming into use and generating business  and this is all happening with the capabilities available with programming languages like python and R and with enough computer power.

      But, Machine Learning is a wider topic which includes lots of research where again statistical thinking comes into picture. For example, times when basic algorithms like Linear, logistic regression were thought about, at the time there was no code or thinking available on Neural Network. It was the result of research and lot of thinking put together that convinced that artificial neural networks can also exist and can perform tasks like image recognition.

      So, both goes hand in hand

      1. research using statistical thinking to build new ideas and algorithms.
      2. using languages and high computing power to provide solutions that generate more business or solve critical problems

      Moreover, I feel the very purpose of Machine Learning can be served with ease when we have lots informational data available from past. So, one more things that matters is data.


      Thanks and regards,

      Priyanka Sadana

      Author's profile photo Divya Munnuru
      Divya Munnuru

      Hi Priyanka ,

      You have written so well in such a way that beginners can also understand.



      Author's profile photo Lavanya K
      Lavanya K

      Hi Priyanka,

      It’s a very nice blog, you have explained the content in layman terms.