Supervised Learning Overview

Core concepts & algorithm overview

Supervised Learning — trains on labeled data (input → known output). Learns patterns to predict or classify unseen data. Works best when historical data exists and outcomes are clearly defined. Two core tasks: Regression (continuous output) & Classification (discrete labels).

Core algorithms at a glance

Linear Regression

Regression

Predicts continuous values

y = mx + b — fits line minimizing squared error

SalesPricesTemp

Logistic Regression

Classifier

Predicts binary outcomes

σ(z) = 1/(1+e⁻ᶻ) sigmoid maps to probability

ChurnSpam

Naive Bayes

Classifier

Classifies via probabilities

Assumes feature independence — fast & simple

NLPSentiment

KNN

Classifier

Predicts from nearby points

k neighbors vote — similar data → similar label

RecoPattern

Decision Tree

Classifier

Rules-based splits

Splits on best feature using Gini/entropy at each node

CreditRules

Random Forest

Ensemble

Many trees, majority vote

Bagging reduces variance & overfitting vs single tree

FraudRobust

SVM

Classifier

Maximizes class margin

Hyperplane with max margin; kernel trick for nonlinear

ImageText

Linear Regression — Loss

MSE = (1/n) Σ(yᵢ - ŷᵢ)²

Minimize sum of squared residuals

Logistic — Sigmoid

P(y=1) = 1 / (1 + e^(-z))

z = β₀ + β₁x₁ + ... · Output ∈ (0,1)

Decision Tree — Split

Gini = 1 − Σ pᵢ²

Lower Gini = purer node. Also: Entropy = −Σ p·log₂p

Bias–variance tradeoff

High Bias (Underfitting) — model too simple, misses patterns. Fix: more complexity, more features.

High Variance (Overfitting) — model too complex, memorizes noise. Fix: regularization, more data, pruning.

Sweet Spot — balances both. Use cross-validation to find it.

Total Error = Bias² + Variance + Irreducible Noise

Model evaluation metrics

Accuracy = (TP+TN) / Total — misleading on imbalanced data

Precision = TP / (TP+FP) · Recall = TP / (TP+FN)

F1 Score = 2 × (Precision × Recall) / (Precision + Recall)

R² (Regression) — proportion of variance explained by model

Algorithm comparison

Algorithm	Type	Strength	Weakness	Best when…
Linear Regression	Regression	Fast, interpretable	Only linear relationships	Continuous target, linear data
Logistic Regression	Classification	Probabilistic output	Poor with nonlinear boundary	Binary outcome, baseline
Naive Bayes	Classification	Very fast, few samples	Feature independence assumption	Text/NLP, small datasets
KNN	Both	No training phase	Slow on large data	Small datasets, local patterns
Decision Tree	Both	Interpretable, no scaling	Prone to overfitting	Rule-based, explainability
Random Forest	Both	High accuracy, robust	Less interpretable, slower	Complex data, general use
SVM	Both	Effective in high dims	Slow on large n, kernel tuning	High-dim, clear margin

with ♥ by sv