Exercise for L1,L2,Huber loss functions in Machine learning

Exercise for L1,L2,Huber loss functions in Machine learning

***You are given the numerical target y=[0,1,3,−8,1] and predicitons y^=[1,1,1,−1,−1] . Compute and select the correct empirical risk using a L2 loss.

Great, let’s compute the empirical risk using L2 loss step by step for your data:

1. Recall the formula

The empirical risk with L2 loss is the mean squared error (MSE):

R=1n∑i=1n(yi−y^i)2

where:

$y_{i}$ = true target
${\hat{y}}_{i}$ = prediction
$n$ = number of samples

2. Plug in your values

True targets: $y = [0, 1, 3, - 8, 1]$ Predictions: $\hat{y} = [1, 1, 1, - 1, - 1]$

Compute squared errors for each sample:

$(0 - 1)^{2} = (- 1)^{2} = 1$
$(1 - 1)^{2} = (0)^{2} = 0$
$(3 - 1)^{2} = (2)^{2} = 4$
$(- 8 - (- 1))^{2} = (- 7)^{2} = 49$
$(1 - (- 1))^{2} = (2)^{2} = 4$

3. Sum and average

SSE=1+0+4+49+4=58

R=585=11.6

✅ Final Answer: The empirical risk (L2 loss) is 11.6.

Would you like me to also show you how this compares to using L1 loss (absolute error) for the same data, so you can see the difference in sensitivity to large errors?

yes

Great idea — let’s compare the empirical risk using L2 loss (which we already computed as 11.6) with the empirical risk using L1 loss (absolute error) for the same data. This will show you how the two losses behave differently.

1. Recall the L1 Loss Formula

The empirical risk with L1 loss is the mean absolute error (MAE):

RL1=1n∑i=1n∣yi−y^i∣

2. Compute Absolute Errors

True targets: $y = [0, 1, 3, - 8, 1]$ Predictions: $\hat{y} = [1, 1, 1, - 1, - 1]$

Errors:

$∣ 0 - 1 ∣ = 1$
$∣ 1 - 1 ∣ = 0$
$∣ 3 - 1 ∣ = 2$
$∣ - 8 - (- 1) ∣ = ∣ - 7 ∣ = 7$
$∣ 1 - (- 1) ∣ = ∣ 2 ∣ = 2$

3. Sum and Average

Sum of absolute errors=1+0+2+7+2=12

RL1=125=2.4

4. Comparison

L2 loss (MSE): 11.6
L1 loss (MAE): 2.4

5. Interpretation

L2 loss penalizes large errors more strongly (the big error of 49 from the $- 8$ vs. $- 1$ case dominates).
L1 loss treats all errors proportionally, so the large error doesn’t overwhelm the average.
That’s why L2 is more sensitive to outliers, while L1 is more robust.

✅ Final Answer:

Empirical risk with L2 loss = 11.6
Empirical risk with L1 loss = 2.4

Would you like me to also show you a visual comparison (plotting the errors) so you can see how L1 vs. L2 loss curves behave differently?

yes

Perfect — let’s visualize the difference between L1 loss (absolute error) and L2 loss (squared error) so you can see how they behave differently.

1. Loss Curves

L1 loss (MAE):

LL1(e)=∣e∣

Grows linearly with error.
Treats all errors proportionally.
More robust to outliers.
L2 loss (MSE):

LL2(e)=e2

Grows quadratically with error.
Penalizes large errors much more strongly.
Sensitive to outliers.

2. Visual Comparison (Conceptual)

Imagine plotting error $e$ on the x-axis and loss on the y-axis:

L1 curve: A “V” shape — straight lines with slope ±1.
L2 curve: A parabola — smooth curve that grows faster as error increases.

So for small errors, both are similar. But for large errors, L2 shoots upward much faster.

3. Applying to Your Example

Errors: $[- 1, 0, 2, - 7, 2]$

L1 loss values: $[1, 0, 2, 7, 2]$ → average = 2.4
L2 loss values: $[1, 0, 4, 49, 4]$ → average = 11.6

Notice how the single large error (-7) dominates the L2 loss (49), while in L1 it’s just 7.

4. Intuition

L1 (MAE): “Fair” to all errors, robust to outliers.
L2 (MSE): Punishes big mistakes heavily, useful when you want to strongly discourage large deviations.

✨ So visually:

L1 = straight “V” line.
L2 = curved parabola. And numerically in your case: 2.4 vs. 11.6.

Would you like me to also show you how Huber loss combines both (L1 for large errors, L2 for small ones) — a hybrid approach often used in practice?

yes

Great, let’s explore Huber loss — the hybrid between L1 and L2 — and see how it compares to what we just computed.

1. Definition

Huber loss is designed to combine the best of both worlds:

Lδ(e)={12e2if ∣e∣≤δδ(∣e∣−12δ)if ∣e∣>δ

For small errors ( $∣ e ∣ \leq δ$ ): behaves like L2 (quadratic).
For large errors ( $∣ e ∣ > δ$ ): behaves like L1 (linear).

$δ$ is a threshold parameter that decides when to switch.

2. Why Use Huber Loss?

L2 loss: Sensitive to outliers (big errors dominate).
L1 loss: Robust to outliers but less smooth for optimization.
Huber loss: Smooth like L2 for small errors, robust like L1 for large errors.

It’s often used in regression tasks where you expect some outliers but still want efficient optimization.

3. Example with Your Data

Errors: $[- 1, 0, 2, - 7, 2]$ Let’s pick $δ = 1.5$ .

Error = -1 → $∣ e ∣ = 1 \leq 1.5$ → L2 part → $0.5 \cdot 1^{2} = 0.5$
Error = 0 → $∣ e ∣ = 0 \leq 1.5$ → L2 part → $0$
Error = 2 → $∣ e ∣ = 2 > 1.5$ → L1 part → $1.5 (2 - 0.75) = 1.5 \cdot 1.25 = 1.875$
Error = -7 → $∣ e ∣ = 7 > 1.5$ → L1 part → $1.5 (7 - 0.75) = 1.5 \cdot 6.25 = 9.375$
Error = 2 → same as above → 1.875

Sum = $0.5 + 0 + 1.875 + 9.375 + 1.875 = 13.625$ Average = $13.625 / 5 = 2.725$

4. Comparison

L2 loss (MSE): 11.6
L1 loss (MAE): 2.4
Huber loss (δ=1.5): 2.725

5. Interpretation

Huber loss sits between L1 and L2.
It doesn’t explode like L2 when there’s a big outlier (-7 error).
It’s slightly higher than L1 because small errors are treated quadratically.
This makes it a robust yet smooth alternative.

✅ Final Insight: Huber loss is often the best compromise when you want robustness to outliers but still smooth optimization.

Would you like me to also show you graphical intuition (how the three curves — L1, L2, Huber — look side by side) so you can visualize the difference clearly?

yes

Here’s the graphical intuition for how L1, L2, and Huber loss behave side by side:

1. Shapes of the Loss Functions

L1 loss (MAE):
- Graph looks like a sharp “V”.
- Increases linearly with error.
- Outliers don’t explode the loss — they just add proportionally.
L2 loss (MSE):
- Graph looks like a smooth parabola.
- Increases quadratically with error.
- Outliers dominate because squaring amplifies large errors.
Huber loss:
- Graph looks like a parabola near the origin (small errors), then transitions into straight lines (like L1) for large errors.
- Smooth at the switch point ( $δ$ ).
- Balances robustness (like L1) with smooth optimization (like L2).

2. Intuition with Your Example

Errors: $[- 1, 0, 2, - 7, 2]$

L1: Each error contributes proportionally → average = 2.4.
L2: The large error (-7) dominates → average = 11.6.
Huber (δ=1.5): Small errors treated quadratically, large ones linearly → average ≈ 2.73.

So visually:

L1 = straight lines.
L2 = steep curve.
Huber = curve near zero, straight lines further out.

3. Practical Takeaway

Use L2 when you want to heavily penalize large deviations.
Use L1 when you want robustness to outliers.
Use Huber when you want a compromise: smooth optimization but not dominated by outliers.

✨ If you imagine plotting them:

All three meet at the origin (error = 0).
L2 grows fastest, L1 grows steadily, Huber grows like L2 near zero but switches to L1 growth beyond δ.

Would you like me to give you a small Python snippet that plots L1, L2, and Huber loss curves together, so you can visualize them directly?

yes

Here’s a simple Python snippet you can run to visualize L1, L2, and Huber loss curves side by side. This will make the differences crystal clear:

python

import numpy as np
import matplotlib.pyplot as plt

# Define error range
errors = np.linspace(-10, 10, 200)

# L1 loss
L1 = np.abs(errors)

# L2 loss
L2 = errors**2

# Huber loss (delta = 1.5)
delta = 1.5
Huber = np.where(np.abs(errors) <= delta,
                 0.5 * errors**2,
                 delta * (np.abs(errors) - 0.5 * delta))

# Plot
plt.figure(figsize=(8,6))
plt.plot(errors, L1, label="L1 Loss (MAE)", color="blue")
plt.plot(errors, L2, label="L2 Loss (MSE)", color="red")
plt.plot(errors, Huber, label="Huber Loss (δ=1.5)", color="green")
plt.axvline(x=0, color="black", linestyle="--", linewidth=0.8)

plt.title("Comparison of L1, L2, and Huber Loss Functions")
plt.xlabel("Error")
plt.ylabel("Loss")
plt.legend()
plt.grid(True)
plt.show()

🔎 What you’ll see:

Blue (L1): Sharp “V” shape, grows linearly.
Red (L2): Smooth parabola, grows quadratically.
Green (Huber): Parabola near zero, then switches to linear growth beyond δ.

This visualization makes it obvious why:

L2 punishes large errors heavily.
L1 is robust to outliers.
Huber balances both.

Would you like me to also prepare a numerical table of losses for your dataset (errors = [-1, 0, 2, -7, 2]) under L1, L2, and Huber, so you can compare them side by side without plotting?

yes

Here’s a numerical comparison table of the losses for your dataset under L1, L2, and Huber loss. This way you can see side by side how each error contributes.

Dataset

True targets: $y = [0, 1, 3, - 8, 1]$
Predictions: $\hat{y} = [1, 1, 1, - 1, - 1]$
Errors: $[- 1, 0, 2, - 7, 2]$

Loss Values

| Error | L1 Loss ( $∣ e ∣$ ) | L2 Loss ( $e^{2}$ ) | Huber Loss ( $δ = 1.5$ ) | |-------|------------------|-------------------|-----------------------------| | -1 | 1 | 1 | 0.5 | | 0 | 0 | 0 | 0 | | 2 | 2 | 4 | 1.875 | | -7 | 7 | 49 | 9.375 | | 2 | 2 | 4 | 1.875 | | Sum | 12 | 58 | 13.625 | | Average (Empirical Risk) | 2.4 | 11.6 | 2.725 |

🔎 Interpretation

L1 (MAE): 2.4 → robust, treats all errors proportionally.
L2 (MSE): 11.6 → dominated by the large error (-7).
Huber (δ=1.5): 2.725 → balances small errors (quadratic) and large errors (linear).

✨ This table makes it clear:

L2 explodes with outliers.
L1 stays steady.
Huber sits in between, offering robustness with smooth optimization.

مرحبا بك في عالم بلوجر

يمكنك تحميل هذا القالب من مدونة جلال البعداني

Exercise for L1,L2,Huber loss functions in Machine learning

1. Recall the formula

2. Plug in your values

3. Sum and average

1. Recall the L1 Loss Formula

2. Compute Absolute Errors

3. Sum and Average

4. Comparison

5. Interpretation

1. Loss Curves

2. Visual Comparison (Conceptual)

3. Applying to Your Example

4. Intuition

1. Definition

2. Why Use Huber Loss?

3. Example with Your Data

4. Comparison

5. Interpretation

1. Shapes of the Loss Functions

2. Intuition with Your Example

3. Practical Takeaway

🔎 What you’ll see:

Dataset

Loss Values

🔎 Interpretation

مدونة البعداني

0 comments

اشترك معنا

التسميات

أرشيف المدونة الإلكترونية

المشاركات الشائعة

اجعل مدونتك افضل مدونة مع التصاميم المجانيه