Logistic Regression

Supervised learning · Classification · Sigmoid · Metrics

A supervised ML algorithm for classification — output is categorical (usually binary). Unlike Linear Regression, it predicts the probability of a class label and maps it to a discrete output via the Sigmoid function. Coefficients are estimated using Maximum Likelihood Estimation (MLE). Input = Independent variable (X) · Output = Categorical label Y ∈ {0, 1}.

Types of logistic regression

Binary

Binary Logistic Regression

OutputTwo possible outcomes (0 or 1)

e.g.Pass/Fail, Spam/Not Spam, Yes/No

Y ∈ {0, 1}

Most common form; single decision boundary

Multinomial

Multinomial Logistic Regression

Output3+ classes, no ordering

e.g.Red / Blue / Green category labels

Y ∈ {C₁, C₂, …, Cₙ}

Uses softmax; one model per class (OvR)

Ordinal

Ordinal Logistic Regression

Output3+ ordered/ranked classes

e.g.Low / Medium / High severity

Y : Low < Med < High

Order matters; uses cumulative logits

Mathematics — sigmoid & logit

Sigmoid (Logistic) Function

σ(z) = 1 / (1 + e⁻ᶻ)

z= β₀ + β₁X₁ + β₂X₂ + … + βₚXₚ (linear combination)

σ(z)Output ∈ (0, 1) — interpreted as probability

β₀Intercept (bias term)

β₁…βₚFeature coefficients

Full Model

P(Y=1) = 1 / (1 + e⁻⁽β⁰⁺β¹X¹⁺…⁾)

P(Y=1)Probability output belongs to class 1

Threshold≥ 0.5 → predict class 1; < 0.5 → class 0

ShapeS-curve (not straight line like Linear Regression)

Log-Odds (Logit Function) — log[ P(Y=1) / P(Y=0) ] = β₀ + β₁X₁ + … + βₚXₚ · This transformation maps probabilities (0–1) onto the full real line (−∞, +∞), creating a linear relationship between input features and the log-odds of the outcome. Logistic Regression is linear in log-odds space.

Evaluation metrics & confusion matrix

Accuracy

% of all correctly predicted instances.

(TP + TN) / (TP+TN+FP+FN)

Misleading on imbalanced data.

Precision

Of all predicted positives, how many were correct?

TP / (TP + FP)

Use when false positives are costly.

Recall

Of all actual positives, how many were caught?

TP / (TP + FN)

Use when false negatives are costly.

F1 Score

Harmonic mean of Precision & Recall.

2 × P×R / (P + R)

Best for imbalanced datasets.

Confusion Matrix

	Predicted Positive	Predicted Negative
Actual Positive	TP True Positive	FN False Negative (Type II)
Actual Negative	FP False Positive (Type I)	TN True Negative

■ Correct predictions

■ Incorrect predictions

Linear regression vs logistic regression

Aspect	Linear Regression	Logistic Regression	Key Difference
Output	Continuous (e.g., price)	Probability → class label	LR predicts a number; LogR predicts a category
Function	Y = β₀ + β₁X (line)	σ(z) = 1/(1+e⁻ᶻ) (curve)	Straight line vs S-shaped sigmoid curve
Range	−∞ to +∞	0 to 1	LogR output is bounded — interpretable as probability
Loss Function	MSE / Least Squares	Log Loss / Cross-Entropy	Optimization method differs fundamentally
Use Case	Regression tasks	Classification tasks	Choose based on whether target is continuous or categorical

with ♥ by sv