Exercise for L1,L2,Huber loss functions in Machine learning
Great, let’s compute the empirical risk using L2 loss step by step for your data:
1. Recall the formula
The empirical risk with L2 loss is the mean squared error (MSE):
where:
= true target
= prediction
= number of samples
2. Plug in your values
True targets: Predictions:
Compute squared errors for each sample:
3. Sum and average
✅ Final Answer: The empirical risk (L2 loss) is 11.6.
Would you like me to also show you how this compares to using L1 loss (absolute error) for the same data, so you can see the difference in sensitivity to large errors?
Great idea — let’s compare the empirical risk using L2 loss (which we already computed as 11.6) with the empirical risk using L1 loss (absolute error) for the same data. This will show you how the two losses behave differently.
1. Recall the L1 Loss Formula
The empirical risk with L1 loss is the mean absolute error (MAE):
2. Compute Absolute Errors
True targets: Predictions:
Errors:
3. Sum and Average
4. Comparison
L2 loss (MSE): 11.6
L1 loss (MAE): 2.4
5. Interpretation
L2 loss penalizes large errors more strongly (the big error of 49 from the vs. case dominates).
L1 loss treats all errors proportionally, so the large error doesn’t overwhelm the average.
That’s why L2 is more sensitive to outliers, while L1 is more robust.
✅ Final Answer:
Empirical risk with L2 loss = 11.6
Empirical risk with L1 loss = 2.4
Would you like me to also show you a visual comparison (plotting the errors) so you can see how L1 vs. L2 loss curves behave differently?
Perfect — let’s visualize the difference between L1 loss (absolute error) and L2 loss (squared error) so you can see how they behave differently.
1. Loss Curves
L1 loss (MAE):
Grows linearly with error.
Treats all errors proportionally.
More robust to outliers.
L2 loss (MSE):
Grows quadratically with error.
Penalizes large errors much more strongly.
Sensitive to outliers.
2. Visual Comparison (Conceptual)
Imagine plotting error on the x-axis and loss on the y-axis:
L1 curve: A “V” shape — straight lines with slope ±1.
L2 curve: A parabola — smooth curve that grows faster as error increases.
So for small errors, both are similar. But for large errors, L2 shoots upward much faster.
3. Applying to Your Example
Errors:
L1 loss values: → average = 2.4
L2 loss values: → average = 11.6
Notice how the single large error (-7) dominates the L2 loss (49), while in L1 it’s just 7.
4. Intuition
L1 (MAE): “Fair” to all errors, robust to outliers.
L2 (MSE): Punishes big mistakes heavily, useful when you want to strongly discourage large deviations.
✨ So visually:
L1 = straight “V” line.
L2 = curved parabola. And numerically in your case: 2.4 vs. 11.6.
Would you like me to also show you how Huber loss combines both (L1 for large errors, L2 for small ones) — a hybrid approach often used in practice?
Great, let’s explore Huber loss — the hybrid between L1 and L2 — and see how it compares to what we just computed.
1. Definition
Huber loss is designed to combine the best of both worlds:
For small errors (): behaves like L2 (quadratic).
For large errors (): behaves like L1 (linear).
is a threshold parameter that decides when to switch.
2. Why Use Huber Loss?
L2 loss: Sensitive to outliers (big errors dominate).
L1 loss: Robust to outliers but less smooth for optimization.
Huber loss: Smooth like L2 for small errors, robust like L1 for large errors.
It’s often used in regression tasks where you expect some outliers but still want efficient optimization.
3. Example with Your Data
Errors: Let’s pick .
Error = -1 → → L2 part →
Error = 0 → → L2 part →
Error = 2 → → L1 part →
Error = -7 → → L1 part →
Error = 2 → same as above → 1.875
Sum = Average =
4. Comparison
L2 loss (MSE): 11.6
L1 loss (MAE): 2.4
Huber loss (δ=1.5): 2.725
5. Interpretation
Huber loss sits between L1 and L2.
It doesn’t explode like L2 when there’s a big outlier (-7 error).
It’s slightly higher than L1 because small errors are treated quadratically.
This makes it a robust yet smooth alternative.
✅ Final Insight: Huber loss is often the best compromise when you want robustness to outliers but still smooth optimization.
Would you like me to also show you graphical intuition (how the three curves — L1, L2, Huber — look side by side) so you can visualize the difference clearly?
Here’s the graphical intuition for how L1, L2, and Huber loss behave side by side:
1. Shapes of the Loss Functions
L1 loss (MAE):
Graph looks like a sharp “V”.
Increases linearly with error.
Outliers don’t explode the loss — they just add proportionally.
L2 loss (MSE):
Graph looks like a smooth parabola.
Increases quadratically with error.
Outliers dominate because squaring amplifies large errors.
Huber loss:
Graph looks like a parabola near the origin (small errors), then transitions into straight lines (like L1) for large errors.
Smooth at the switch point ().
Balances robustness (like L1) with smooth optimization (like L2).
2. Intuition with Your Example
Errors:
L1: Each error contributes proportionally → average = 2.4.
L2: The large error (-7) dominates → average = 11.6.
Huber (δ=1.5): Small errors treated quadratically, large ones linearly → average ≈ 2.73.
So visually:
L1 = straight lines.
L2 = steep curve.
Huber = curve near zero, straight lines further out.
3. Practical Takeaway
Use L2 when you want to heavily penalize large deviations.
Use L1 when you want robustness to outliers.
Use Huber when you want a compromise: smooth optimization but not dominated by outliers.
✨ If you imagine plotting them:
All three meet at the origin (error = 0).
L2 grows fastest, L1 grows steadily, Huber grows like L2 near zero but switches to L1 growth beyond δ.
Would you like me to give you a small Python snippet that plots L1, L2, and Huber loss curves together, so you can visualize them directly?
Here’s a simple Python snippet you can run to visualize L1, L2, and Huber loss curves side by side. This will make the differences crystal clear:
import numpy as np
import matplotlib.pyplot as plt
# Define error range
errors = np.linspace(-10, 10, 200)
# L1 loss
L1 = np.abs(errors)
# L2 loss
L2 = errors**2
# Huber loss (delta = 1.5)
delta = 1.5
Huber = np.where(np.abs(errors) <= delta,
0.5 * errors**2,
delta * (np.abs(errors) - 0.5 * delta))
# Plot
plt.figure(figsize=(8,6))
plt.plot(errors, L1, label="L1 Loss (MAE)", color="blue")
plt.plot(errors, L2, label="L2 Loss (MSE)", color="red")
plt.plot(errors, Huber, label="Huber Loss (δ=1.5)", color="green")
plt.axvline(x=0, color="black", linestyle="--", linewidth=0.8)
plt.title("Comparison of L1, L2, and Huber Loss Functions")
plt.xlabel("Error")
plt.ylabel("Loss")
plt.legend()
plt.grid(True)
plt.show()
🔎 What you’ll see:
Blue (L1): Sharp “V” shape, grows linearly.
Red (L2): Smooth parabola, grows quadratically.
Green (Huber): Parabola near zero, then switches to linear growth beyond δ.
This visualization makes it obvious why:
L2 punishes large errors heavily.
L1 is robust to outliers.
Huber balances both.
Would you like me to also prepare a numerical table of losses for your dataset (errors = [-1, 0, 2, -7, 2]) under L1, L2, and Huber, so you can compare them side by side without plotting?
Here’s a numerical comparison table of the losses for your dataset under L1, L2, and Huber loss. This way you can see side by side how each error contributes.
Dataset
True targets:
Predictions:
Errors:
Loss Values
| Error | L1 Loss () | L2 Loss () | Huber Loss () | |-------|------------------|-------------------|-----------------------------| | -1 | 1 | 1 | 0.5 | | 0 | 0 | 0 | 0 | | 2 | 2 | 4 | 1.875 | | -7 | 7 | 49 | 9.375 | | 2 | 2 | 4 | 1.875 | | Sum | 12 | 58 | 13.625 | | Average (Empirical Risk) | 2.4 | 11.6 | 2.725 |
🔎 Interpretation
L1 (MAE): 2.4 → robust, treats all errors proportionally.
L2 (MSE): 11.6 → dominated by the large error (-7).
Huber (δ=1.5): 2.725 → balances small errors (quadratic) and large errors (linear).
✨ This table makes it clear:
L2 explodes with outliers.
L1 stays steady.
Huber sits in between, offering robustness with smooth optimization.

0 comments