Linear Regression
Supervised learning · Equations · Metrics

A supervised ML algorithm that models the relationship between inputs and a continuous output by fitting a straight line. Predicts values by minimizing error between actual and predicted outputs via the Least Squares Method. Input = Independent variable (X) · Output = Dependent variable (Y).

Simple
Simple Linear Regression
InputOne independent variable (X)
e.g.Hours studied → Marks scored
Y = β₁X + β₀
One predictor, one straight line
β₁Slope — change in Y per unit change in X
β₀Intercept — value of Y when X = 0
YPredicted (target) value
XInput feature value
Multiple
Multiple Linear Regression
InputTwo or more independent variables
e.g.Hours studied + Attendance → Marks
Y = β₀ + β₁X₁ + β₂X₂ + … + βₚXₚ
p predictors, hyperplane fit
β₀Intercept — baseline when all X = 0
βᵢCoefficient — impact of feature Xᵢ on Y
XᵢEach individual input feature

Least Squares Regression Line (LSRL) — minimizes the sum of squared vertical distances between actual data points and the predicted line. Guarantees the best-fit line through the data by making residuals (yᵢ − ŷᵢ) as small as possible overall.

1
MAE
Mean Absolute Error
Average of absolute differences — treats all errors equally. Robust to outliers.
1/n Σ |yᵢ − ŷᵢ|
Same units as Y. Lower = better.
2
MSE
Mean Squared Error
Average of squared errors. Penalizes large errors more heavily than MAE.
1/n Σ (yᵢ − ŷᵢ)²
Units = Y². Sensitive to outliers.
3
RMSE
Root Mean Squared Error
Square root of MSE — restores original units. Most commonly reported metric.
√ [1/n Σ (yᵢ − ŷᵢ)²]
Same units as Y. Lower = better.
4
Coefficient of Determination
Proportion of variance in Y explained by the model. Scale-free, 0 to 1.
1 − [SS_res / SS_tot]
R² = 1 → perfect fit. R² = 0 → no fit.
Metric Formula Units Strength Weakness
MAEΣ|yᵢ−ŷᵢ| / nSame as YInterpretable, outlier-robustIgnores error magnitude
MSEΣ(yᵢ−ŷᵢ)² / nY squaredPenalizes large errorsHard to interpret directly
RMSE√MSESame as YInterpretable + penalizes large errorsStill sensitive to outliers
1 − SS_res/SS_totUnitless (0–1)Scale-free model quality measureCan be misleading with many features
Assumptions of Linear Regression
Linearity — relationship between X and Y must be linear
Independence — observations must be independent of each other
Homoscedasticity — constant variance of residuals across all X
Normality — residuals should be normally distributed
No multicollinearity — input features should not be highly correlated (MLR)
Key terms to remember
Residual = yᵢ − ŷᵢ — difference between actual and predicted value
SS_res — sum of squared residuals (unexplained variance)
SS_tot — total sum of squares (total variance in Y)
Overfitting — model fits training data too well, poor on new data
Regularization — Ridge (L2) or Lasso (L1) to reduce overfitting in MLR

with by sv