Supervised Learning Overview
Core concepts & algorithm overview
Supervised Learning — trains on labeled data (input → known output). Learns patterns to predict or classify unseen data. Works best when historical data exists and outcomes are clearly defined. Two core tasks: Regression (continuous output) & Classification (discrete labels).
Core algorithms at a glance
Linear Regression
Regression
Predicts continuous values
y = mx + b — fits line minimizing squared error
SalesPricesTemp
Logistic Regression
Classifier
Predicts binary outcomes
σ(z) = 1/(1+e⁻ᶻ) sigmoid maps to probability
ChurnSpam
Naive Bayes
Classifier
Classifies via probabilities
Assumes feature independence — fast & simple
NLPSentiment
KNN
Classifier
Predicts from nearby points
k neighbors vote — similar data → similar label
RecoPattern
Decision Tree
Classifier
Rules-based splits
Splits on best feature using Gini/entropy at each node
CreditRules
Random Forest
Ensemble
Many trees, majority vote
Bagging reduces variance & overfitting vs single tree
FraudRobust
SVM
Classifier
Maximizes class margin
Hyperplane with max margin; kernel trick for nonlinear
ImageText
Bias–variance tradeoff
High Bias (Underfitting) — model too simple, misses patterns. Fix: more complexity, more features.
High Variance (Overfitting) — model too complex, memorizes noise. Fix: regularization, more data, pruning.
Sweet Spot — balances both. Use cross-validation to find it.
Total Error = Bias² + Variance + Irreducible Noise
Model evaluation metrics
Accuracy = (TP+TN) / Total — misleading on imbalanced data
Precision = TP / (TP+FP) · Recall = TP / (TP+FN)
F1 Score = 2 × (Precision × Recall) / (Precision + Recall)
R² (Regression) — proportion of variance explained by model
Algorithm comparison
| Algorithm |
Type |
Strength |
Weakness |
Best when… |
| Linear Regression | Regression | Fast, interpretable | Only linear relationships | Continuous target, linear data |
| Logistic Regression | Classification | Probabilistic output | Poor with nonlinear boundary | Binary outcome, baseline |
| Naive Bayes | Classification | Very fast, few samples | Feature independence assumption | Text/NLP, small datasets |
| KNN | Both | No training phase | Slow on large data | Small datasets, local patterns |
| Decision Tree | Both | Interpretable, no scaling | Prone to overfitting | Rule-based, explainability |
| Random Forest | Both | High accuracy, robust | Less interpretable, slower | Complex data, general use |
| SVM | Both | Effective in high dims | Slow on large n, kernel tuning | High-dim, clear margin |
with ♥ by sv