Courses
Courses for Kids
Free study material
Offline Centres
More
Store Icon
Store

Linear Regression

Reviewed by:
ffImage
hightlight icon
highlight icon
highlight icon
share icon
copy icon
SearchIcon

Understanding the Linear Regression Basics, Formula, and Applications with Examples

Linear regression is one of the most fundamental and widely used techniques in statistics and machine learning. It serves as the foundation for many complex algorithms and provides valuable insights into relationships between variables. This guide covers everything you need to know about linear regression, including its formula, examples, assumptions, types, and more.


What is Linear Regression?

Linear regression is used to predict the relationship between two variables by applying a linear equation to observed data. There are two types of variable, one variable is called an independent variable, and the other is a dependent variable. Linear regression is commonly used for predictive analysis. The main idea of regression is to examine two things. First, does a set of predictor variables do a good job in predicting an outcome (dependent) variable? The second thing is which variables are significant predictors of the outcome variable?


Linear Regression of an 2 Variable


Linear Regression Example

Example 1: Linear regression can predict house prices based on size. 

For example, if the formula is:

Price = 50,000 + 100 × Size (sq. ft),

a 2,000 sq. ft. house would cost:

Price = 50,000 + 100 × 2,000 = 250,000.

It helps find relationships and make predictions.


Example 2: Linear regression can predict sales based on advertising spend. For example, if the formula is:

Sales = 5,000 + 20 × Ad Spend (in $1,000s),

and a company spends $50,000 on ads:

Sales = 5,000 + 20 × 50 = 105,000.

It shows how advertising impacts sales.


Linear Regression Equation

The measure of the relationship between two variables is shown by the correlation coefficient. The range of the coefficient lies between -1 to +1. This coefficient shows the strength of the association of the observed data between two variables.

 

Linear Regression Equation is given below:

 

Y=a+bX

 

where X is the independent variable and it is plotted along the x-axis

 

Y is the dependent variable and it is plotted along the y-axis

 

Here, the slope of the line is b, and a is the intercept (the value of y when x = 0).

 

Linear Regression Formula

As we know, linear regression shows the linear relationship between two variables. The equation of linear regression is similar to that of the slope formula.  We have learned this formula before in earlier classes such as a linear equation in two variables. Linear Regression Formula  is given by the equation

 

Y= a + bX

We will find the value of a and b by using the below formula

a= \[\dfrac{\left ( \sum_{Y}^{} \right )\left ( \sum_{X^{2}}^{} \right )-\left ( \sum_{X}^{} \right )\left ( \sum_{XY}^{} \right )}{n\left ( \sum_{x^{2}}^{} \right )-\left ( \sum_{x}^{} \right )^{2}}\]


 b= \[\dfrac{n\left ( \sum_{XY}^{} \right )-\left ( \sum_{X}^{} \right )\left ( \sum_{Y}^{} \right )}{n\left ( \sum_{x^{2}}^{} \right )-\left ( \sum_{x}^{} \right )^{2}}\]


How Does Linear Regression Work?

Linear regression works by modelling the relationship between two variables, x (independent variable) and y (dependent variable), using a straight line. The independent variable, x, is represented on the horizontal axis, while the dependent variable, y, is plotted on the vertical axis. The goal is to find a line that best fits the data points and explains the relationship between the variables.


Linear Regression of the set of data represented in graph


Steps in Linear Regression

Using the simplest form of the equation for a straight line,$y = c \cdot x + m$, where ccc is the slope and mmm is the y-intercept, linear regression follows these steps:


  1. Plot the Data Points: Start by plotting the given data points, such as (1,5), (2,8), and (3,11).

  2. Adjust the Line: Draw a straight line and iteratively adjust its direction to minimise the distance (error) between the line and the data points.

  3. Determine the Equation: Once the line fits the data, identify the equation of the line. For the given dataset, the equation becomes $y = 3 \cdot x + 2$.

  4. Make Predictions: Use the equation to predict values. For example, when x=4, substitute it into the equation to find $y = 3 \cdot 4 + 2 = 14$.


This process enables linear regression to identify trends and make predictions based on existing data.

 

Properties of Linear Regression

For the regression line where the regression parameters b0 and b1are defined, the following properties are applicable:

  • The regression line reduces the sum of squared differences between observed values and predicted values.

  • The regression line passes through the mean of X and Y variable values.

  • The regression constant b0 is equal to the y-intercept of the linear regression.

  • The regression coefficient b1 is the slope of the regression line. Its value is equal to the average change in the dependent variable (Y) for a unit change in the independent variable (X)


Key Ideas of Linear Regression 

  • Correlation explains the interrelation between variables within the data.

  • Variance is the degree of the spread of the data. 

  • Standard deviation is the dispersion of the mean from a data set by studying the variance’s square root.

  • Residual (error term) is the actual value found within the dataset minus the expected value that is predicted in linear regression. 


Types of Linear Regression

There are majorly three types of Linear Regression they are:


  1. Simple Linear Regression

  2. Multiple Linear Regression

  3. Polynomial Linear Regression


Simple Linear Regression

  • Involves one independent variable and one dependent variable.

  • Example: Predicting house price based on its size.


Multiple Linear Regression

  • Involves two or more independent variables and one dependent variable.

  • Example: Predicting house price based on size, location, and age of the house.


Polynomial Regression

  • Models a non-linear relationship by fitting a polynomial equation to the data.

  • Example: Predicting sales growth trends over time.


Regression Coefficient

The regression coefficient is given by the equation :

Y= B0+B1X

Where

B0 is a constant

B1 is the regression coefficient

Given below is the formula to find the value of the regression coefficient.

B1=b1 = [(xi-x)(yi-y)]/[(xi-x)2]

Where xi and yi are the observed data sets.

And x and y are the mean value.


Importance of Regression Line 

A regression line is used to describe the behaviour of a set of data, a logical approach that helps us study and analyze the relationship between two different continuous variables. Which is then enacted in machine learning models, mathematical analysis, statistics field, forecasting sectors, and other such quantitative applications. Looking at the financial sector, where financial analysts use linear regression to predict stock prices and commodity prices and perform various stock valuations for different securities. Several well-renowned companies make use of linear regressions for the purpose of predicting sales, inventories, etc. 


Key Ideas of Linear Regression 

  • Correlation explains the interrelation between variables within the data.

  • Variance is the degree of the spread of the data.

  • Standard deviation is the dispersion of mean from a data set by studying the variance’s square root.

  • Residual (error term) is the actual value found within the dataset minus the expected value that is predicted in linear regression. 


Important Properties of Regression Line 

  • Regression coefficient values remain the same because the shifting of origin takes place because of the change of scale. The property says that if the variables x and y are changed to u and v respectively u= (x-a)/p v=(y-c) /q, Here p and q are the constants.Byz =q/p*bvu Bxy=p/q*buv.

  • If there are two lines of regression and both the lines intersect at a selected point (x’, y’). The variables x and y are considered. According to the property, the intersection of the two regression lines is (x`, y`), which is the solution of the equations for both the variables x and y. 

  • You will understand that the correlation coefficient between the two variables x and y is the geometric mean of both the coefficients. Also, the sign over the values of correlation coefficients will be the common sign of both the coefficients. So, if according to the property regression coefficients are byx= (b) and bxy= (b’) then the correlation coefficient is r=+-sqrt (byx + bxy) which is why in some cases, both the values of coefficients are negative value and r is also negative. If both the values of coefficients are positive then r is going to be positive.

  • The regression constant (a0) is equal to the y-intercept of the regression line and also  a0 and a1 are the regression parameters.

 

Regression Line Formula: 

A linear regression line equation is written as-
Y = a + bX

where X is plotted on the x-axis and Y is plotted on the y-axis. X is an independent variable and Y is the dependent variable. Here, b is the slope of the line and a is the intercept, i.e. value of y when x=0. 

Multiple Regression Line Formula: y= a +b1x1 +b2x2 + b3x3 +…+ btxt + u 


Assumptions made in Linear Regression

  • The dependent/target variable is continuous.

  • There isn’t any relationship between the independent variables.

  • There should be a linear relationship between the dependent and explanatory variables.

  • Residuals should follow a normal distribution.

  • Residuals should have constant variance.

  • Residuals should be independently distributed/no autocorrelation.


Solved Examples

1. Find a linear regression equation for the following two sets of data:

x

2

4

6

8

y

3

7

5

10

Sol: To find the linear regression equation we need to find the value of Σx, Σy, Σx

2

2

and Σxy 

Construct the table and find the value

x

y

xy

2

3

4

6

4

7

16

28

6

5

36

30

8

10

64

80

Σx = 20

Σy = 25

Σx² = 120

Σxy = 144

The formula of the linear equation is y=a+bx. Using the formula we will find the value of a and b

a= \[\frac{\left ( \sum_{Y}^{} \right )\left ( \sum_{X^{2}}^{} \right )-\left ( \sum_{X}^{} \right )\left ( \sum_{XY}^{} \right )}{n\left ( \sum_{x^{2}}^{} \right )-\left ( \sum_{x}^{} \right )^{2}}\]


Now put the values in the equation

\[a=\frac{25\times 120-20\times 144}{4\times 120-400}\]


a= \[\frac{120}{80}\]


a=1.5


b= \[\frac{n\left ( \sum_{XY}^{} \right )-\left ( \sum_{X}^{} \right )\left ( \sum_{Y}^{} \right )}{n\left ( \sum_{x^{2}}^{} \right )-\left ( \sum_{x}^{} \right )^{2}}\]

Put the values in the equation


\[b=\frac{4\times 144-20\times 25}{4\times 120-400}\]


b=\[\frac{76}{80}\]


b=0.95

Hence we got the value of a = 1.5 and b = 0.95

The linear equation is given by

Y = a + bx

Now put the value of a and b in the equation

Hence equation of linear regression is y = 1.5 + 0.95x


FAQs on Linear Regression

1. Why are regression lines considered to be important?

Regression lines are used in the financial sector by companies where various financial analysts implement linear regressions to predict stock prices, commodity prices and to perform valuations for many different securities. 

2. How do you define Slope?

Slope tells you how much your target variable will change as the independent variable increases or decreases.


The formula of the slope is y=mx+b, where m is the slope. 

3. How will you explain the difference between Linear regression and multiple regression?

The main difference between linear and multiple linear regression is that linear regression contains only one independent variable whereas multiple regression contains two or more independent variables. 

4. How will you define cost function in linear regression?

Cost function is the calculation of the error obtained between the predicted values and actual values, which is represented as a single number called an error. 

5. What are some examples of linear regression?

Total number of sales, Agricultural scientists use linear regression to estimate the effect of fertilizer on the total crops yielded, the effect of drug dosage on blood pressure. 

6. What are the Types of Linear Regression?

Different types of linear regression are:

  • Simple linear regression

  • Multiple linear regression

  • Logistic regression

  • Ordinal regression

  • Multinomial regression

  • Discriminant Analysis

7. What are the Differences Between Linear and Logistic Regression?

Linear regression is used to predict the value of a continuous dependent variable with the help of independent variables. Logistic Regression is used to predict the categorical dependent variable with the help of independent variables. It is also used to predict the values of categorical variables.

8. How Does a Linear Regression Work?

Linear Regression is the process of finding a line that best fits the data points available on the plot. So it used to predict output values for inputs that are not present in the data set. Generally, those outputs would fall on the line.

9. What are the Assumptions of Linear Regression?

Linear regression relies on several assumptions to work effectively:

  • The relationship between independent and dependent variables is linear.

  • The residuals (errors) are normally distributed.

  • There is homoscedasticity, meaning the variance of errors is consistent across all levels of the independent variable(s).

  • There is no multicollinearity among independent variables in multiple regression.

10. What is the Difference Between Simple and Polynomial Regression?

Simple linear regression models a straight-line relationship between two variables, while polynomial regression models a non-linear relationship using polynomial equations. For example, predicting growth trends might require a polynomial regression due to its curved nature.

11. How Do You Measure the Accuracy of a Linear Regression Model?

The accuracy of a linear regression model is typically measured using metrics such as:

  • R-squared (R²): Proportion of variance in the dependent variable explained by the independent variable(s).

  • Mean Squared Error (MSE): Average of squared differences between actual and predicted values.

  • Root Mean Squared Error (RMSE): Square root of MSE, representing the error in the same units as the dependent variable.

12. Explain Linear Regression With Example

Linear regression models the relationship between a dependent variable (y) and an independent variable (x) using a straight line. For example, a company predicts sales based on advertising spend. If the equation is Sales = 5 + 5 × Ad Spend ($1000), then spending $5,000 results in Sales = 5 + 5 × 5 = $30,000. It helps make predictions and identify trends.