Linear Regression
Supervised learning · Equations · Metrics
A supervised ML algorithm that models the relationship between inputs and a continuous output by fitting a straight line. Predicts values by minimizing error between actual and predicted outputs via the Least Squares Method. Input = Independent variable (X) · Output = Dependent variable (Y).
Types of linear regression
Simple
Simple Linear Regression
InputOne independent variable (X)
e.g.Hours studied → Marks scored
Y = β₁X + β₀
One predictor, one straight line
β₁Slope — change in Y per unit change in X
β₀Intercept — value of Y when X = 0
YPredicted (target) value
XInput feature value
Multiple
Multiple Linear Regression
InputTwo or more independent variables
e.g.Hours studied + Attendance → Marks
Y = β₀ + β₁X₁ + β₂X₂ + … + βₚXₚ
p predictors, hyperplane fit
β₀Intercept — baseline when all X = 0
βᵢCoefficient — impact of feature Xᵢ on Y
XᵢEach individual input feature
Least Squares Regression Line (LSRL) — minimizes the sum of squared vertical distances between actual data points and the predicted line. Guarantees the best-fit line through the data by making residuals (yᵢ − ŷᵢ) as small as possible overall.
Key evaluation metrics
Average of absolute differences — treats all errors equally. Robust to outliers.
1/n Σ |yᵢ − ŷᵢ|
Same units as Y. Lower = better.
Average of squared errors. Penalizes large errors more heavily than MAE.
1/n Σ (yᵢ − ŷᵢ)²
Units = Y². Sensitive to outliers.
Square root of MSE — restores original units. Most commonly reported metric.
√ [1/n Σ (yᵢ − ŷᵢ)²]
Same units as Y. Lower = better.
Proportion of variance in Y explained by the model. Scale-free, 0 to 1.
1 − [SS_res / SS_tot]
R² = 1 → perfect fit. R² = 0 → no fit.
Metric comparison & when to use
| Metric |
Formula |
Units |
Strength |
Weakness |
| MAE | Σ|yᵢ−ŷᵢ| / n | Same as Y | Interpretable, outlier-robust | Ignores error magnitude |
| MSE | Σ(yᵢ−ŷᵢ)² / n | Y squared | Penalizes large errors | Hard to interpret directly |
| RMSE | √MSE | Same as Y | Interpretable + penalizes large errors | Still sensitive to outliers |
| R² | 1 − SS_res/SS_tot | Unitless (0–1) | Scale-free model quality measure | Can be misleading with many features |
Key assumptions & concepts
Assumptions of Linear Regression
Linearity — relationship between X and Y must be linear
Independence — observations must be independent of each other
Homoscedasticity — constant variance of residuals across all X
Normality — residuals should be normally distributed
No multicollinearity — input features should not be highly correlated (MLR)
Key terms to remember
Residual = yᵢ − ŷᵢ — difference between actual and predicted value
SS_res — sum of squared residuals (unexplained variance)
SS_tot — total sum of squares (total variance in Y)
Overfitting — model fits training data too well, poor on new data
Regularization — Ridge (L2) or Lasso (L1) to reduce overfitting in MLR
with ♥ by sv