

Understanding the Linear Regression Basics, Formula, and Applications with Examples
Linear regression is one of the most fundamental and widely used techniques in statistics and machine learning. It serves as the foundation for many complex algorithms and provides valuable insights into relationships between variables. This guide covers everything you need to know about linear regression, including its formula, examples, assumptions, types, and more.
What is Linear Regression?
Linear regression is used to predict the relationship between two variables by applying a linear equation to observed data. There are two types of variable, one variable is called an independent variable, and the other is a dependent variable. Linear regression is commonly used for predictive analysis. The main idea of regression is to examine two things. First, does a set of predictor variables do a good job in predicting an outcome (dependent) variable? The second thing is which variables are significant predictors of the outcome variable?

Linear Regression Example
Example 1: Linear regression can predict house prices based on size.
For example, if the formula is:
Price = 50,000 + 100 × Size (sq. ft),
a 2,000 sq. ft. house would cost:
Price = 50,000 + 100 × 2,000 = 250,000.
It helps find relationships and make predictions.
Example 2: Linear regression can predict sales based on advertising spend. For example, if the formula is:
Sales = 5,000 + 20 × Ad Spend (in $1,000s),
and a company spends $50,000 on ads:
Sales = 5,000 + 20 × 50 = 105,000.
It shows how advertising impacts sales.
Linear Regression Equation
The measure of the relationship between two variables is shown by the correlation coefficient. The range of the coefficient lies between -1 to +1. This coefficient shows the strength of the association of the observed data between two variables.
Linear Regression Equation is given below:
Y=a+bX
where X is the independent variable and it is plotted along the x-axis
Y is the dependent variable and it is plotted along the y-axis
Here, the slope of the line is b, and a is the intercept (the value of y when x = 0).
Linear Regression Formula
As we know, linear regression shows the linear relationship between two variables. The equation of linear regression is similar to that of the slope formula. We have learned this formula before in earlier classes such as a linear equation in two variables. Linear Regression Formula is given by the equation
Y= a + bX
We will find the value of a and b by using the below formula
a= \[\dfrac{\left ( \sum_{Y}^{} \right )\left ( \sum_{X^{2}}^{} \right )-\left ( \sum_{X}^{} \right )\left ( \sum_{XY}^{} \right )}{n\left ( \sum_{x^{2}}^{} \right )-\left ( \sum_{x}^{} \right )^{2}}\]
b= \[\dfrac{n\left ( \sum_{XY}^{} \right )-\left ( \sum_{X}^{} \right )\left ( \sum_{Y}^{} \right )}{n\left ( \sum_{x^{2}}^{} \right )-\left ( \sum_{x}^{} \right )^{2}}\]
How Does Linear Regression Work?
Linear regression works by modelling the relationship between two variables, x (independent variable) and y (dependent variable), using a straight line. The independent variable, x, is represented on the horizontal axis, while the dependent variable, y, is plotted on the vertical axis. The goal is to find a line that best fits the data points and explains the relationship between the variables.

Steps in Linear Regression
Using the simplest form of the equation for a straight line,$y = c \cdot x + m$, where ccc is the slope and mmm is the y-intercept, linear regression follows these steps:
Plot the Data Points: Start by plotting the given data points, such as (1,5), (2,8), and (3,11).
Adjust the Line: Draw a straight line and iteratively adjust its direction to minimise the distance (error) between the line and the data points.
Determine the Equation: Once the line fits the data, identify the equation of the line. For the given dataset, the equation becomes $y = 3 \cdot x + 2$.
Make Predictions: Use the equation to predict values. For example, when x=4, substitute it into the equation to find $y = 3 \cdot 4 + 2 = 14$.
This process enables linear regression to identify trends and make predictions based on existing data.
Properties of Linear Regression
For the regression line where the regression parameters b0 and b1are defined, the following properties are applicable:
The regression line reduces the sum of squared differences between observed values and predicted values.
The regression line passes through the mean of X and Y variable values.
The regression constant b0 is equal to the y-intercept of the linear regression.
The regression coefficient b1 is the slope of the regression line. Its value is equal to the average change in the dependent variable (Y) for a unit change in the independent variable (X)
Key Ideas of Linear Regression
Correlation explains the interrelation between variables within the data.
Variance is the degree of the spread of the data.
Standard deviation is the dispersion of the mean from a data set by studying the variance’s square root.
Residual (error term) is the actual value found within the dataset minus the expected value that is predicted in linear regression.
Types of Linear Regression
There are majorly three types of Linear Regression they are:
Simple Linear Regression
Multiple Linear Regression
Polynomial Linear Regression
Simple Linear Regression
Involves one independent variable and one dependent variable.
Example: Predicting house price based on its size.
Multiple Linear Regression
Involves two or more independent variables and one dependent variable.
Example: Predicting house price based on size, location, and age of the house.
Polynomial Regression
Models a non-linear relationship by fitting a polynomial equation to the data.
Example: Predicting sales growth trends over time.
Regression Coefficient
The regression coefficient is given by the equation :
Y= B0+B1X
Where
B0 is a constant
B1 is the regression coefficient
Given below is the formula to find the value of the regression coefficient.
B1=b1 = ∑[(xi-x)(yi-y)]/∑[(xi-x)2]
Where xi and yi are the observed data sets.
And x and y are the mean value.
Importance of Regression Line
A regression line is used to describe the behaviour of a set of data, a logical approach that helps us study and analyze the relationship between two different continuous variables. Which is then enacted in machine learning models, mathematical analysis, statistics field, forecasting sectors, and other such quantitative applications. Looking at the financial sector, where financial analysts use linear regression to predict stock prices and commodity prices and perform various stock valuations for different securities. Several well-renowned companies make use of linear regressions for the purpose of predicting sales, inventories, etc.
Key Ideas of Linear Regression
Correlation explains the interrelation between variables within the data.
Variance is the degree of the spread of the data.
Standard deviation is the dispersion of mean from a data set by studying the variance’s square root.
Residual (error term) is the actual value found within the dataset minus the expected value that is predicted in linear regression.
Important Properties of Regression Line
Regression coefficient values remain the same because the shifting of origin takes place because of the change of scale. The property says that if the variables x and y are changed to u and v respectively u= (x-a)/p v=(y-c) /q, Here p and q are the constants.Byz =q/p*bvu Bxy=p/q*buv.
If there are two lines of regression and both the lines intersect at a selected point (x’, y’). The variables x and y are considered. According to the property, the intersection of the two regression lines is (x`, y`), which is the solution of the equations for both the variables x and y.
You will understand that the correlation coefficient between the two variables x and y is the geometric mean of both the coefficients. Also, the sign over the values of correlation coefficients will be the common sign of both the coefficients. So, if according to the property regression coefficients are byx= (b) and bxy= (b’) then the correlation coefficient is r=+-sqrt (byx + bxy) which is why in some cases, both the values of coefficients are negative value and r is also negative. If both the values of coefficients are positive then r is going to be positive.
The regression constant (a0) is equal to the y-intercept of the regression line and also a0 and a1 are the regression parameters.
Regression Line Formula:
A linear regression line equation is written as-
Y = a + bX
where X is plotted on the x-axis and Y is plotted on the y-axis. X is an independent variable and Y is the dependent variable. Here, b is the slope of the line and a is the intercept, i.e. value of y when x=0.
Multiple Regression Line Formula: y= a +b1x1 +b2x2 + b3x3 +…+ btxt + u
Assumptions made in Linear Regression
The dependent/target variable is continuous.
There isn’t any relationship between the independent variables.
There should be a linear relationship between the dependent and explanatory variables.
Residuals should follow a normal distribution.
Residuals should have constant variance.
Residuals should be independently distributed/no autocorrelation.
Solved Examples
1. Find a linear regression equation for the following two sets of data:
Sol: To find the linear regression equation we need to find the value of Σx, Σy, Σx
2
2
and Σxy
Construct the table and find the value
The formula of the linear equation is y=a+bx. Using the formula we will find the value of a and b
a= \[\frac{\left ( \sum_{Y}^{} \right )\left ( \sum_{X^{2}}^{} \right )-\left ( \sum_{X}^{} \right )\left ( \sum_{XY}^{} \right )}{n\left ( \sum_{x^{2}}^{} \right )-\left ( \sum_{x}^{} \right )^{2}}\]
Now put the values in the equation
\[a=\frac{25\times 120-20\times 144}{4\times 120-400}\]
a= \[\frac{120}{80}\]
a=1.5
b= \[\frac{n\left ( \sum_{XY}^{} \right )-\left ( \sum_{X}^{} \right )\left ( \sum_{Y}^{} \right )}{n\left ( \sum_{x^{2}}^{} \right )-\left ( \sum_{x}^{} \right )^{2}}\]
Put the values in the equation
\[b=\frac{4\times 144-20\times 25}{4\times 120-400}\]
b=\[\frac{76}{80}\]
b=0.95
Hence we got the value of a = 1.5 and b = 0.95
The linear equation is given by
Y = a + bx
Now put the value of a and b in the equation
Hence equation of linear regression is y = 1.5 + 0.95x
FAQs on Linear Regression
1. What is linear regression in simple terms?
Linear regression is a statistical method used to model the relationship between two variables by fitting a straight line to the observed data. One variable is considered to be an independent variable (the cause), and the other is a dependent variable (the effect). The primary goal is to predict the value of the dependent variable based on the value of the independent variable.
2. What is the basic equation for a linear regression line?
The equation for a simple linear regression line is similar to the slope-intercept form of a line in geometry. It is expressed as:
Y = a + bX
Where:
- Y is the dependent variable (the value you want to predict).
- X is the independent variable (the predictor).
- b is the slope of the line, which represents the change in Y for a one-unit change in X.
- a is the y-intercept, which is the value of Y when X is 0.
3. What are some real-world examples of linear regression?
Linear regression is used in many fields to make predictions and understand relationships. Some common examples include:
- Business: Predicting a company's sales based on its advertising expenditure.
- Economics: Estimating the impact of inflation on house prices.
- Education: Predicting a student's final exam score based on the number of hours they studied.
- Health: Analysing the relationship between a person's weight and their blood pressure.
4. What are the main types of linear regression?
The main types of linear regression are:
- Simple Linear Regression: Involves a single independent variable to predict a single dependent variable. Example: Predicting house price based only on its size.
- Multiple Linear Regression: Involves two or more independent variables to predict a single dependent variable. Example: Predicting house price based on its size, location, and age.
- Polynomial Regression: Models a non-linear relationship by fitting a polynomial equation to the data, even though the model itself is still considered linear in terms of its parameters.
5. How does linear regression differ from correlation?
While both linear regression and correlation describe the relationship between two variables, they serve different purposes. Correlation measures the strength and direction (positive or negative) of the association, resulting in a single value between -1 and +1. In contrast, linear regression goes a step further by defining the exact mathematical relationship with an equation (the line of best fit), allowing you to make predictions.
6. What is the 'line of best fit' in linear regression and how is it determined?
The 'line of best fit' is the straight line that passes through a scatter plot of data points in a way that minimizes the overall distance from the line to all the points. It represents the best possible description of the relationship. It is typically determined using the Least Squares Method, which calculates the line that minimizes the sum of the squared vertical distances (called residuals) between each actual data point and the line itself.
7. What key assumptions must be met for a linear regression model to be reliable?
For a linear regression model to produce accurate and reliable results, several assumptions about the data must be true:
- Linearity: The relationship between the independent and dependent variables must be linear.
- Independence: The observations (or their errors) must be independent of one another.
- Homoscedasticity: The variance of the errors should be constant across all levels of the independent variable.
- Normality: The errors (residuals) should be normally distributed.
8. How is linear regression used in the field of machine learning?
In machine learning, linear regression is a fundamental supervised learning algorithm used for prediction and forecasting tasks. It serves as a baseline model for more complex algorithms. Its simplicity, speed, and high interpretability make it ideal for quickly understanding the relationship between variables and making initial predictions, such as forecasting sales, predicting stock prices, or estimating customer lifetime value.
9. What is the difference between simple linear regression and multiple linear regression?
The key difference lies in the number of independent variables used. Simple linear regression uses only one independent variable to predict a dependent variable (e.g., predicting crop yield from rainfall amount). Multiple linear regression uses two or more independent variables to predict a dependent variable, which often provides a more realistic and accurate model (e.g., predicting crop yield from rainfall, fertilizer amount, and sunlight exposure).
10. Can a strong linear regression model prove that one variable causes another?
No, this is a common misconception. A strong linear regression model can only show that two variables are strongly associated and move together. It cannot prove causation. For example, ice cream sales and the number of swimming accidents are strongly correlated, but one does not cause the other. A hidden third variable, such as hot weather, is the actual cause of both. Establishing causation requires controlled experiments, not just observational data analysis.

















