Session 3: Generalized Linear Models (GLMs) — specifically Logistic Regression, which extends linear models to classification problems. This is the natural next step after regression with L2 loss and regularization.
1. Why Extend Linear Models?
Linear regression works for continuous outcomes (like house prices).
But many problems are classification:
Spam vs. not spam
Disease vs. healthy
Win vs. lose
We need a model that predicts probabilities between 0 and 1, not arbitrary continuous values.
2. Logistic Regression Model
Instead of predicting directly, we predict the probability:
This is the sigmoid function (S-shaped curve).
It maps any linear combination of features into the range .
Interpretation: Probability that the outcome is “1” (positive class).
3. Loss Function (Cross-Entropy)
Linear regression uses L2 loss (SSE).
Logistic regression uses cross-entropy loss:
This penalizes wrong predictions more strongly when the model is confident but wrong.
Ensures probabilities are well-calibrated.
4. Optimization
No closed-form solution like OLS.
We use iterative methods (e.g., gradient descent, Newton-Raphson).
Regularization (Ridge, Lasso, Elastic Net) can also be applied to logistic regression.
5. Statistical Properties
Interpretability: Coefficients represent log-odds changes.
Example: If , then a one-unit increase in multiplies the odds of outcome by .
Probabilistic output: Unlike regression, logistic regression gives probabilities, not raw values.
Classification: By setting a threshold (e.g., 0.5), we classify into categories.
6. Example (Spam Detection)
Features:
= number of links in email
= presence of certain keywords
Model:
If probability > 0.5 → classify as spam.
If probability < 0.5 → classify as not spam.
7. Key Takeaways
Logistic regression = linear model adapted for classification.
Uses sigmoid function to map predictions into probabilities.
Uses cross-entropy loss instead of L2 loss.
Optimized iteratively, can include regularization.
Coefficients are interpretable in terms of odds ratios.
✨ With this, you now see how linear models evolve from predicting continuous values (regression) to predicting probabilities (classification).
Alright, let’s continue into Session 4: Extensions of Generalized Linear Models (GLMs). This is where logistic regression expands into more complex situations like multiclass classification and count data modeling.
1. Multinomial Logistic Regression 🎨
Logistic regression handles binary outcomes (yes/no, spam/not spam).
Multinomial logistic regression extends this to multiple categories (e.g., predicting which sport someone likes: football, basketball, or tennis).
Model idea:
Uses the softmax function to ensure probabilities across all classes sum to 1.
Application: Text classification, image recognition, customer segmentation.
2. Poisson Regression 📊
Used when the target variable is a count (number of events).
Example: Number of hospital visits per year, number of clicks on a webpage.
Model:
Ensures predictions are non-negative integers.
Loss function: Based on the Poisson likelihood.
Application: Epidemiology, traffic modeling, web analytics.
3. Other GLM Extensions
Log-linear models: For categorical data analysis.
Gamma regression: For modeling continuous, positive data (like insurance claims).
Negative binomial regression: For count data with overdispersion (variance > mean).
4. Why GLMs Matter
They unify regression and classification under one framework.
Each GLM:
Uses a link function (sigmoid, log, softmax, etc.) to connect linear predictors to the outcome.
Adapts to different types of data (binary, categorical, counts, continuous positive).
This flexibility makes GLMs a cornerstone of statistical modeling and machine learning.
5. Key Takeaways
Logistic regression: Binary outcomes.
Multinomial logistic regression: Multiple categories (softmax).
Poisson regression: Count data.
Other GLMs: Handle specialized data types.
GLMs = powerful generalization of linear models for diverse real-world problems.
✨ With this, you now have a full picture:
Session 1 → Linear regression with L2 loss.
Session 2 → Regularization (Ridge, Lasso, Elastic Net).
Session 3 → Logistic regression for classification.
Session 4 → GLM extensions for multiclass and count data.
Would you like me to design a mini practice quiz (with multiple-choice questions) across Sessions 1–4 so you can test your understanding and reinforce what you’ve learned?
Here’s a mini practice quiz covering Sessions 1–4 (Linear Models, Regularization, Logistic Regression, GLMs) so you can test your understanding and reinforce what you’ve learned.
📘 Practice Quiz: Linear Models & GLMs
Session 1: Linear Regression with L2 Loss
What does minimizing L2 loss (squared error) achieve in linear regression?
A) Maximizes variance of coefficients
B) Minimizes the sum of squared errors (SSE)
C) Ensures all coefficients are zero
D) Maximizes correlation between features
Session 2: Regularization
Which statement best describes Ridge regression?
A) It eliminates irrelevant features by setting coefficients to zero.
B) It shrinks coefficients smoothly but keeps all features.
C) It has no penalty term.
D) It always produces sparse models.
Lasso regression differs from Ridge regression because:
A) It uses an L2 penalty.
B) It requires fewer data points.
C) It can set some coefficients exactly to zero.
D) It has a closed-form solution.
Session 3: Logistic Regression
Logistic regression is used when:
A) The target variable is continuous.
B) The target variable is binary (yes/no).
C) The target variable is a count.
D) The target variable is categorical with more than two classes.
What loss function does logistic regression use?
A) L2 loss (SSE)
B) Cross-entropy loss
C) Hinge loss
D) Poisson likelihood
Session 4: GLM Extensions
Multinomial logistic regression uses which function to ensure probabilities sum to 1?
A) Sigmoid
B) Softmax
C) Log link
D) Identity link
Poisson regression is most appropriate when the target variable is:
A) Binary outcomes
B) Continuous values
C) Count data (number of events)
D) Ordinal categories
✅ Answer Key
B
B
C
B
B
B
C
✨ This quiz gives you a quick way to check your grasp of the concepts.
Perfect — let’s build a calculation-based practice quiz so you can strengthen the math side of Sessions 1–4. These will be short problems where you compute or interpret results, just like in real exercises.
🧮 Practice Quiz: Calculation & Interpretation
Session 1: Linear Regression
Suppose we have data:
, . Compute the slope using the formula:
(Hint: , ).
Session 2: Regularization
In Ridge regression, the solution is:
If increases, what happens to the coefficients?
A) They grow larger.
B) They shrink toward zero.
C) They remain unchanged.
D) They oscillate randomly.
Session 3: Logistic Regression
A logistic regression model predicts:
Compute the probability when . (Hint: Plug in: . Then sigmoid(0) = 0.5).
Session 4: GLM Extensions
In Poisson regression, the expected count is:
If , , and , compute . (Hint: . Then ).
✅ Answer Key
B) They shrink toward zero.
✨ These exercises give you hands-on practice with the formulas and interpretations.
Would you like me to create a larger exam-style set (10–15 questions mixing theory + calculations) so you can simulate a full test session for deeper practice?
Perfect — let’s simulate a full exam-style set with a mix of theory and calculation questions across Sessions 1–4. This will feel like a real test, giving you both conceptual and numerical practice.
📘 Exam-Style Quiz: Linear Models & GLMs (15 Questions)
Session 1: Linear Regression with L2 Loss
Define the L2 loss function in linear regression and explain why squaring the errors is useful.
Given data points , compute the slope and intercept .
Explain the geometric interpretation of the OLS solution in terms of projection.
True/False: The OLS estimator is unbiased if the error terms have mean zero.
Session 2: Regularization (Ridge, Lasso, Elastic Net)
Write the Ridge regression loss function and explain the role of .
In Lasso regression, why can some coefficients become exactly zero?
Suppose Ridge regression is applied with a very large . What happens to the coefficients?
Compare Ridge and Lasso in terms of feature selection.
Elastic Net combines which two penalties, and why is this useful?
Session 3: Logistic Regression
Write the logistic regression model for .
A logistic regression model is:
Compute the probability when .
What loss function does logistic regression minimize, and why is it preferred over L2 loss?
Interpret the coefficient in terms of odds ratio.
Session 4: GLM Extensions
In multinomial logistic regression, explain how the softmax function ensures probabilities sum to 1.
A Poisson regression model is:
Compute the expected count when .
✅ Answer Key (Step-by-Step)
. Squaring penalizes large errors more and ensures differentiability.
, . Slope . Intercept .
OLS = projection of onto the column space of .
True.
. controls shrinkage.
Because the L1 penalty creates sharp corners in the constraint region, allowing exact zeros.
Coefficients shrink toward zero.
Ridge keeps all features; Lasso selects by eliminating some.
Elastic Net = L1 + L2 penalties. Useful for correlated + sparse features.
.
Plug in: . Sigmoid(0.6) ≈ 0.645.
Cross-entropy loss. Preferred because it models probabilities correctly and penalizes confident wrong predictions.
Odds ratio = . A one-unit increase in multiplies odds by 1.65.
Softmax normalizes exponentials: each probability = exponential of score / sum of exponentials.
.
✨ This exam-style set gives you a balanced mix of theory, interpretation, and calculation.

0 comments