Supervised Learning Overview
Core concepts & algorithm overview

Supervised Learning — trains on labeled data (input → known output). Learns patterns to predict or classify unseen data. Works best when historical data exists and outcomes are clearly defined. Two core tasks: Regression (continuous output) & Classification (discrete labels).

Linear Regression
Regression
Predicts continuous values
y = mx + b — fits line minimizing squared error
SalesPricesTemp
Logistic Regression
Classifier
Predicts binary outcomes
σ(z) = 1/(1+e⁻ᶻ) sigmoid maps to probability
ChurnSpam
Naive Bayes
Classifier
Classifies via probabilities
Assumes feature independence — fast & simple
NLPSentiment
KNN
Classifier
Predicts from nearby points
k neighbors vote — similar data → similar label
RecoPattern
Decision Tree
Classifier
Rules-based splits
Splits on best feature using Gini/entropy at each node
CreditRules
Random Forest
Ensemble
Many trees, majority vote
Bagging reduces variance & overfitting vs single tree
FraudRobust
SVM
Classifier
Maximizes class margin
Hyperplane with max margin; kernel trick for nonlinear
ImageText
Linear Regression — Loss
MSE = (1/n) Σ(yᵢ - ŷᵢ)²
Minimize sum of squared residuals
Logistic — Sigmoid
P(y=1) = 1 / (1 + e^(-z))
z = β₀ + β₁x₁ + ... · Output ∈ (0,1)
Decision Tree — Split
Gini = 1 − Σ pᵢ²
Lower Gini = purer node. Also: Entropy = −Σ p·log₂p
High Bias (Underfitting) — model too simple, misses patterns. Fix: more complexity, more features.
High Variance (Overfitting) — model too complex, memorizes noise. Fix: regularization, more data, pruning.
Sweet Spot — balances both. Use cross-validation to find it.
Total Error = Bias² + Variance + Irreducible Noise
Accuracy = (TP+TN) / Total — misleading on imbalanced data
Precision = TP / (TP+FP) · Recall = TP / (TP+FN)
F1 Score = 2 × (Precision × Recall) / (Precision + Recall)
R² (Regression) — proportion of variance explained by model
Algorithm Type Strength Weakness Best when…
Linear RegressionRegressionFast, interpretableOnly linear relationshipsContinuous target, linear data
Logistic RegressionClassificationProbabilistic outputPoor with nonlinear boundaryBinary outcome, baseline
Naive BayesClassificationVery fast, few samplesFeature independence assumptionText/NLP, small datasets
KNNBothNo training phaseSlow on large dataSmall datasets, local patterns
Decision TreeBothInterpretable, no scalingProne to overfittingRule-based, explainability
Random ForestBothHigh accuracy, robustLess interpretable, slowerComplex data, general use
SVMBothEffective in high dimsSlow on large n, kernel tuningHigh-dim, clear margin

with by sv