Linear Regression

Supervised learning · Equations · Metrics

A supervised ML algorithm that models the relationship between inputs and a continuous output by fitting a straight line. Predicts values by minimizing error between actual and predicted outputs via the Least Squares Method. Input = Independent variable (X) · Output = Dependent variable (Y).

Types of linear regression

Simple

Simple Linear Regression

InputOne independent variable (X)

e.g.Hours studied → Marks scored

Y = β₁X + β₀

One predictor, one straight line

β₁Slope — change in Y per unit change in X

β₀Intercept — value of Y when X = 0

YPredicted (target) value

XInput feature value

Multiple

Multiple Linear Regression

InputTwo or more independent variables

e.g.Hours studied + Attendance → Marks

Y = β₀ + β₁X₁ + β₂X₂ + … + βₚXₚ

p predictors, hyperplane fit

β₀Intercept — baseline when all X = 0

βᵢCoefficient — impact of feature Xᵢ on Y

XᵢEach individual input feature

Least Squares Regression Line (LSRL) — minimizes the sum of squared vertical distances between actual data points and the predicted line. Guarantees the best-fit line through the data by making residuals (yᵢ − ŷᵢ) as small as possible overall.

Key evaluation metrics

MAE

Mean Absolute Error

Average of absolute differences — treats all errors equally. Robust to outliers.

1/n Σ |yᵢ − ŷᵢ|

Same units as Y. Lower = better.

MSE

Mean Squared Error

Average of squared errors. Penalizes large errors more heavily than MAE.

1/n Σ (yᵢ − ŷᵢ)²

Units = Y². Sensitive to outliers.

RMSE

Root Mean Squared Error

Square root of MSE — restores original units. Most commonly reported metric.

√ [1/n Σ (yᵢ − ŷᵢ)²]

Same units as Y. Lower = better.

R²

Coefficient of Determination

Proportion of variance in Y explained by the model. Scale-free, 0 to 1.

1 − [SS_res / SS_tot]

R² = 1 → perfect fit. R² = 0 → no fit.

Metric comparison & when to use

Metric	Formula	Units	Strength	Weakness
MAE	Σ\|yᵢ−ŷᵢ\| / n	Same as Y	Interpretable, outlier-robust	Ignores error magnitude
MSE	Σ(yᵢ−ŷᵢ)² / n	Y squared	Penalizes large errors	Hard to interpret directly
RMSE	√MSE	Same as Y	Interpretable + penalizes large errors	Still sensitive to outliers
R²	1 − SS_res/SS_tot	Unitless (0–1)	Scale-free model quality measure	Can be misleading with many features

Key assumptions & concepts

Assumptions of Linear Regression

Linearity — relationship between X and Y must be linear

Independence — observations must be independent of each other

Homoscedasticity — constant variance of residuals across all X

Normality — residuals should be normally distributed

No multicollinearity — input features should not be highly correlated (MLR)

Key terms to remember

Residual = yᵢ − ŷᵢ — difference between actual and predicted value

SS_res — sum of squared residuals (unexplained variance)

SS_tot — total sum of squares (total variance in Y)

Overfitting — model fits training data too well, poor on new data

Regularization — Ridge (L2) or Lasso (L1) to reduce overfitting in MLR

with ♥ by sv