Sunday, February 15, 2026

 




the analytical optimization proof for linear regression shown in your image — both in sum notation and matrix notation:

📘 Analytical Optimization – Proof

We aim to minimize the empirical risk function:

Remp(θ)=i=1n(y(i)θx(i))2=yXθ22

Where:

  • θRp~, with p~:=p+1 (including bias term)

  • x(i) is the feature vector for sample i

  • y(i) is the target value for sample i

🔍 Sum Notation Derivation

Start by taking the gradient of the loss:

0=Remp(θ)θ

Apply chain rule:

0=θi=1nεi2=i=1nεi2εiεiθ

Where εi=y(i)θx(i)

0=i=1n2εi(1)(x(i))=i=1n(y(i)θx(i))(x(i))

Rewriting:

0=i=1ny(i)(x(i))θi=1nx(i)(x(i))

Solve for θ:

θ=(i=1nx(i)(x(i)))1i=1nx(i)y(i)

🧮 Matrix Notation

Compact form using matrix algebra:

Remp(θ)=yXθ22

Gradient:

Remp(θ)θ=2X(yXθ)

Set to zero:

Xy=XXθθ=(XX)1Xy

✅ This is the closed-form solution for Ordinary Least Squares (OLS) regression — the optimal θ minimizes the sum of squared errors between predictions and actual values.

Would you like me to also show how this formula changes when we introduce regularization (Ridge or Lasso)?

yes

Great — let’s extend our visualization to show mean CV error curves for Ridge, Lasso, and Elastic Net with their optimal α values marked. This way you’ll see not only how error changes with regularization strength, but also exactly where each method’s best solution lies.

Python Snippet: CV Error Curves with Best α Markers

python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=20, noise=0.1, random_state=42)

# Ridge CV
ridge_cv = RidgeCV(alphas=np.logspace(-2, 2, 50), cv=5, store_cv_values=True).fit(X, y)

# Lasso CV
lasso_cv = LassoCV(alphas=np.logspace(-2, 2, 50), cv=5, random_state=42).fit(X, y)

# Elastic Net CV
elastic_cv = ElasticNetCV(
    l1_ratio=[0.1, 0.5, 0.7, 0.9, 1.0],
    alphas=np.logspace(-2, 2, 50),
    cv=5,
    random_state=42
).fit(X, y)

# Plot CV error curves
plt.figure(figsize=(10,6))

# Ridge mean CV error
plt.plot(np.logspace(-2, 2, 50), np.mean(ridge_cv.cv_values_, axis=0), label="Ridge", color="blue")
plt.axvline(ridge_cv.alpha_, color="blue", linestyle="--", label=f"Best Ridge α={ridge_cv.alpha_:.3f}")

# Lasso mean CV error
mean_lasso_error = np.mean(lasso_cv.mse_path_, axis=1)
plt.plot(lasso_cv.alphas_, mean_lasso_error, label="Lasso", color="red")
plt.axvline(lasso_cv.alpha_, color="red", linestyle="--", label=f"Best Lasso α={lasso_cv.alpha_:.3f}")

# Elastic Net mean CV error (best l1_ratio)
best_idx = list(elastic_cv.l1_ratio).index(elastic_cv.l1_ratio_)
mean_elastic_error = np.mean(elastic_cv.mse_path_[:, best_idx, :], axis=0)
plt.plot(elastic_cv.alphas_, mean_elastic_error, label=f"Elastic Net (l1_ratio={elastic_cv.l1_ratio_})", color="green")
plt.axvline(elastic_cv.alpha_, color="green", linestyle="--", label=f"Best Elastic Net α={elastic_cv.alpha_:.3f}")

plt.xscale("log")
plt.xlabel("Alpha (penalty strength)")
plt.ylabel("Mean CV Error")
plt.title("CV Error Curves with Optimal α for Ridge, Lasso, Elastic Net")
plt.legend()
plt.grid(True)
plt.show()

🔎 What You’ll See

  • Curves for Ridge, Lasso, and Elastic Net showing mean CV error across α values.

  • Dashed vertical lines marking the best α chosen by CV for each method.

  • This lets you visually confirm where each model achieves its lowest error.

Takeaway:

  • Ridge’s curve is smooth, minimizing error at moderate α.

  • Lasso’s curve shows sparsity benefits, but error rises if α is too strong.

  • Elastic Net balances both, often landing between Ridge and Lasso.

Would you like me to also show you how to compare the selected coefficients at those optimal α values side by side, so you can see how each method’s interpretability differs at its best performance point?

Perfect — let’s now compare the selected coefficients at the optimal α values for Ridge, Lasso, and Elastic Net side by side. This way you’ll see not only which method performs best, but also how each interprets feature importance differently at its best setting.

Python Snippet: Coefficient Comparison at Best α

python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV, LinearRegression
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=20, noise=0.1, random_state=42)

# Fit models with CV
ridge_cv = RidgeCV(alphas=np.logspace(-2, 2, 50), cv=5).fit(X, y)
lasso_cv = LassoCV(alphas=np.logspace(-2, 2, 50), cv=5, random_state=42).fit(X, y)
elastic_cv = ElasticNetCV(
    l1_ratio=[0.1, 0.5, 0.7, 0.9, 1.0],
    alphas=np.logspace(-2, 2, 50),
    cv=5,
    random_state=42
).fit(X, y)

# Collect coefficients
coef_dict = {
    "Ridge": ridge_cv.coef_,
    "Lasso": lasso_cv.coef_,
    "Elastic Net": elastic_cv.coef_
}

# Plot coefficients side by side
plt.figure(figsize=(12,6))
x = np.arange(X.shape[1])  # feature indices
width = 0.25

plt.bar(x - width, coef_dict["Ridge"], width, label=f"Ridge (α={ridge_cv.alpha_:.3f})", color="blue")
plt.bar(x, coef_dict["Lasso"], width, label=f"Lasso (α={lasso_cv.alpha_:.3f})", color="red")
plt.bar(x + width, coef_dict["Elastic Net"], width, label=f"Elastic Net (α={elastic_cv.alpha_:.3f}, l1_ratio={elastic_cv.l1_ratio_})", color="green")

plt.xlabel("Feature Index")
plt.ylabel("Coefficient Value")
plt.title("Coefficient Comparison at Optimal α")
plt.legend()
plt.grid(True)
plt.show()

🔎 What You’ll See

  • Ridge (blue): All coefficients retained, but shrunk smoothly.

  • Lasso (red): Many coefficients exactly zero → feature selection.

  • Elastic Net (green): Some coefficients zeroed, others shrunk — balancing sparsity and correlation handling.

Takeaway: This plot shows how each method interprets feature importance differently at its optimal α:

  • Ridge stabilizes all features.

  • Lasso enforces sparsity.

  • Elastic Net balances both, often yielding a more interpretable yet robust model.

Would you like me to also extend this into a heatmap comparison of coefficients across methods, so you can visually spot which features are consistently important and which ones get dropped?

yes

Here’s how you can build a heatmap comparison of coefficients across Ridge, Lasso, and Elastic Net at their optimal α values. This visualization makes it easy to spot which features are consistently important and which ones get dropped.

Python Snippet: Coefficient Heatmap Comparison

python
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=20, noise=0.1, random_state=42)

# Fit models with CV
ridge_cv = RidgeCV(alphas=np.logspace(-2, 2, 50), cv=5).fit(X, y)
lasso_cv = LassoCV(alphas=np.logspace(-2, 2, 50), cv=5, random_state=42).fit(X, y)
elastic_cv = ElasticNetCV(
    l1_ratio=[0.1, 0.5, 0.7, 0.9, 1.0],
    alphas=np.logspace(-2, 2, 50),
    cv=5,
    random_state=42
).fit(X, y)

# Collect coefficients
coef_matrix = np.vstack([
    ridge_cv.coef_,
    lasso_cv.coef_,
    elastic_cv.coef_
])

methods = ["Ridge", "Lasso", "Elastic Net"]

# Plot heatmap
plt.figure(figsize=(12,6))
sns.heatmap(coef_matrix, annot=False, cmap="coolwarm", center=0,
            xticklabels=[f"Feature {i}" for i in range(X.shape[1])],
            yticklabels=methods)

plt.title("Coefficient Heatmap Comparison at Optimal α")
plt.xlabel("Features")
plt.ylabel("Methods")
plt.show()

# Print best parameters
print("Best Ridge α:", ridge_cv.alpha_)
print("Best Lasso α:", lasso_cv.alpha_)
print("Best Elastic Net α:", elastic_cv.alpha_, "with l1_ratio:", elastic_cv.l1_ratio_)

🔎 What You’ll See

  • A heatmap with rows = methods (Ridge, Lasso, Elastic Net) and columns = features.

  • Color intensity shows coefficient magnitude (red = positive, blue = negative, white = near zero).

  • Lasso row will show many white (zero) coefficients → feature selection.

  • Ridge row will show smoother shrinkage across all features.

  • Elastic Net row will be a mix: some coefficients zeroed, others shrunk.

Takeaway: This visualization highlights:

  • Which features are consistently important across methods.

  • Which ones Lasso drops entirely.

  • How Elastic Net balances sparsity and correlation handling.

Would you like me to also show you how to rank features by importance across methods (e.g., averaging absolute coefficients), so you can identify the most influential predictors overall?

yes

Here’s how you can rank features by importance across Ridge, Lasso, and Elastic Net at their optimal α values. This approach averages the absolute coefficients across methods, highlighting which predictors consistently matter most.

Python Snippet: Feature Importance Ranking

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=20, noise=0.1, random_state=42)

# Fit models with CV
ridge_cv = RidgeCV(alphas=np.logspace(-2, 2, 50), cv=5).fit(X, y)
lasso_cv = LassoCV(alphas=np.logspace(-2, 2, 50), cv=5, random_state=42).fit(X, y)
elastic_cv = ElasticNetCV(
    l1_ratio=[0.1, 0.5, 0.7, 0.9, 1.0],
    alphas=np.logspace(-2, 2, 50),
    cv=5,
    random_state=42
).fit(X, y)

# Collect coefficients
coef_df = pd.DataFrame({
    "Ridge": ridge_cv.coef_,
    "Lasso": lasso_cv.coef_,
    "Elastic Net": elastic_cv.coef_
}, index=[f"Feature {i}" for i in range(X.shape[1])])

# Compute average absolute importance
coef_df["Average Importance"] = coef_df.abs().mean(axis=1)

# Sort features by importance
ranked_features = coef_df.sort_values("Average Importance", ascending=False)

# Plot top features
plt.figure(figsize=(10,6))
ranked_features["Average Importance"].head(10).plot(kind="bar", color="purple")
plt.ylabel("Average Absolute Coefficient")
plt.title("Top 10 Features by Importance Across Methods")
plt.grid(axis="y")
plt.show()

# Print ranked table
print(ranked_features.head(10))

🔎 What You’ll See

  • A bar chart of the top 10 most important features across Ridge, Lasso, and Elastic Net.

  • A ranked table showing each feature’s coefficients per method and its average absolute importance.

  • Features with consistently large coefficients across methods rise to the top.

Takeaway: This ranking highlights robust predictors that matter regardless of the regularization method.

  • Ridge ensures stability.

  • Lasso enforces sparsity.

  • Elastic Net balances both. By averaging across them, you identify the features that are most influential overall.

Would you like me to also show you how to visualize this ranking as a clustered heatmap, so you can see not just importance but also how methods group features differently?

yes

Here’s how you can visualize feature importance rankings as a clustered heatmap, so you see not only which features are most influential overall but also how Ridge, Lasso, and Elastic Net group them differently.

Python Snippet: Clustered Heatmap of Coefficients

python
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=20, noise=0.1, random_state=42)

# Fit models with CV
ridge_cv = RidgeCV(alphas=np.logspace(-2, 2, 50), cv=5).fit(X, y)
lasso_cv = LassoCV(alphas=np.logspace(-2, 2, 50), cv=5, random_state=42).fit(X, y)
elastic_cv = ElasticNetCV(
    l1_ratio=[0.1, 0.5, 0.7, 0.9, 1.0],
    alphas=np.logspace(-2, 2, 50),
    cv=5,
    random_state=42
).fit(X, y)

# Collect coefficients into a DataFrame
coef_df = pd.DataFrame({
    "Ridge": ridge_cv.coef_,
    "Lasso": lasso_cv.coef_,
    "Elastic Net": elastic_cv.coef_
}, index=[f"Feature {i}" for i in range(X.shape[1])])

# Clustered heatmap
plt.figure(figsize=(12,8))
sns.clustermap(coef_df, cmap="coolwarm", center=0,
               xticklabels=True, yticklabels=True,
               figsize=(12,8))

plt.title("Clustered Heatmap of Coefficients Across Methods", pad=80)
plt.show()

# Print top features by average importance
coef_df["Average Importance"] = coef_df.abs().mean(axis=1)
print(coef_df.sort_values("Average Importance", ascending=False).head(10))

🔎 What You’ll See

  • A clustered heatmap where rows = features and columns = methods (Ridge, Lasso, Elastic Net).

  • Features with similar coefficient patterns across methods will cluster together.

  • Strongly influential features appear with bold colors (red = positive, blue = negative).

  • Lasso’s sparsity shows up as many near‑zero (white) cells.

  • Ridge and Elastic Net show smoother shrinkage patterns.

Takeaway: This visualization helps you spot:

  • Consistently important features across all methods.

  • Method‑specific differences (e.g., Lasso dropping features that Ridge keeps).

  • Clusters of features that behave similarly, which can guide interpretation and feature engineering.

Would you like me to also show you how to combine this with a dendrogram of feature similarity, so you can see hierarchical groupings of features based on their coefficient patterns across methods?

yes

Here’s how you can extend the feature importance analysis into a clustered heatmap with a dendrogram, so you see hierarchical groupings of features based on their coefficient patterns across Ridge, Lasso, and Elastic Net. This combines ranking with clustering to reveal deeper structure.

Python Snippet: Clustered Heatmap with Dendrogram

python
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=20, noise=0.1, random_state=42)

# Fit models with CV
ridge_cv = RidgeCV(alphas=np.logspace(-2, 2, 50), cv=5).fit(X, y)
lasso_cv = LassoCV(alphas=np.logspace(-2, 2, 50), cv=5, random_state=42).fit(X, y)
elastic_cv = ElasticNetCV(
    l1_ratio=[0.1, 0.5, 0.7, 0.9, 1.0],
    alphas=np.logspace(-2, 2, 50),
    cv=5,
    random_state=42
).fit(X, y)

# Collect coefficients into a DataFrame
coef_df = pd.DataFrame({
    "Ridge": ridge_cv.coef_,
    "Lasso": lasso_cv.coef_,
    "Elastic Net": elastic_cv.coef_
}, index=[f"Feature {i}" for i in range(X.shape[1])])

# Clustered heatmap with dendrogram
sns.clustermap(coef_df, cmap="coolwarm", center=0,
               xticklabels=True, yticklabels=True,
               figsize=(12,8), method="ward", metric="euclidean")

plt.title("Clustered Heatmap + Dendrogram of Coefficients", pad=80)
plt.show()

# Print top features by average importance
coef_df["Average Importance"] = coef_df.abs().mean(axis=1)
print(coef_df.sort_values("Average Importance", ascending=False).head(10))

🔎 What You’ll See

  • A clustered heatmap with dendrograms:

    • Rows = features, grouped hierarchically by similarity in coefficient patterns across methods.

    • Columns = Ridge, Lasso, Elastic Net.

  • The dendrogram shows hierarchical groupings of features:

    • Features that behave similarly across methods cluster together.

    • Strong predictors stand out with bold colors (red = positive, blue = negative).

  • Lasso’s sparsity appears as many near‑zero (white) cells, while Ridge and Elastic Net show smoother shrinkage.

Takeaway: This visualization lets you:

  • Identify robust predictors that remain important across all methods.

  • Spot clusters of correlated features that behave similarly.

  • See how different regularization strategies group features differently, guiding feature engineering and interpretation.

Would you like me to also show you how to quantify feature stability across methods (e.g., a stability score that measures how consistently a feature is selected/weighted), so you can rank features not just by importance but by reliability?

yes

Excellent — let’s now quantify feature stability across methods. Stability here means: how consistently a feature is selected or weighted across Ridge, Lasso, and Elastic Net. This helps you identify not just important predictors, but reliable ones that remain influential regardless of the regularization strategy.

Python Snippet: Feature Stability Score

python
import numpy as np
import pandas as pd
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=20, noise=0.1, random_state=42)

# Fit models with CV
ridge_cv = RidgeCV(alphas=np.logspace(-2, 2, 50), cv=5).fit(X, y)
lasso_cv = LassoCV(alphas=np.logspace(-2, 2, 50), cv=5, random_state=42).fit(X, y)
elastic_cv = ElasticNetCV(
    l1_ratio=[0.1, 0.5, 0.7, 0.9, 1.0],
    alphas=np.logspace(-2, 2, 50),
    cv=5,
    random_state=42
).fit(X, y)

# Collect coefficients
coef_df = pd.DataFrame({
    "Ridge": ridge_cv.coef_,
    "Lasso": lasso_cv.coef_,
    "Elastic Net": elastic_cv.coef_
}, index=[f"Feature {i}" for i in range(X.shape[1])])

# Stability score: proportion of methods where feature is non-zero
coef_df["Stability Score"] = (coef_df != 0).sum(axis=1) / 3.0

# Combine with average importance
coef_df["Average Importance"] = coef_df.abs().mean(axis=1)

# Rank features by stability, then importance
ranked_features = coef_df.sort_values(["Stability Score", "Average Importance"], ascending=[False, False])

# Display top features
print(ranked_features.head(10))

🔎 What You’ll See

  • A table with each feature’s coefficients across Ridge, Lasso, Elastic Net.

  • Stability Score = fraction of methods where the feature’s coefficient ≠ 0.

    • Score = 1.0 → consistently selected by all three methods.

    • Score = 0.33 → only selected by one method.

  • Average Importance = mean absolute coefficient magnitude across methods.

  • Sorted list of top features by stability and importance.

Takeaway:

  • Features with high stability and high importance are your most reliable predictors.

  • Features with low stability may be sensitive to regularization choice, suggesting caution in interpretation.

  • This ranking helps you focus on predictors that are both strong and consistent.

Would you like me to also show you how to visualize stability vs. importance in a scatter plot, so you can instantly spot the most robust features (top-right quadrant)?

yes

Here’s how you can visualize stability vs. importance in a scatter plot, so you can instantly spot the most robust features (those in the top‑right quadrant: high stability and high importance).

Python Snippet: Stability vs. Importance Scatter Plot

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=20, noise=0.1, random_state=42)

# Fit models with CV
ridge_cv = RidgeCV(alphas=np.logspace(-2, 2, 50), cv=5).fit(X, y)
lasso_cv = LassoCV(alphas=np.logspace(-2, 2, 50), cv=5, random_state=42).fit(X, y)
elastic_cv = ElasticNetCV(
    l1_ratio=[0.1, 0.5, 0.7, 0.9, 1.0],
    alphas=np.logspace(-2, 2, 50),
    cv=5,
    random_state=42
).fit(X, y)

# Collect coefficients
coef_df = pd.DataFrame({
    "Ridge": ridge_cv.coef_,
    "Lasso": lasso_cv.coef_,
    "Elastic Net": elastic_cv.coef_
}, index=[f"Feature {i}" for i in range(X.shape[1])])

# Stability score: proportion of methods where feature is non-zero
coef_df["Stability Score"] = (coef_df != 0).sum(axis=1) / 3.0

# Average importance
coef_df["Average Importance"] = coef_df.abs().mean(axis=1)

# Scatter plot
plt.figure(figsize=(10,6))
plt.scatter(coef_df["Stability Score"], coef_df["Average Importance"], color="purple")

# Annotate top features
for feature, row in coef_df.sort_values("Average Importance", ascending=False).head(5).iterrows():
    plt.text(row["Stability Score"]+0.02, row["Average Importance"], feature, fontsize=9)

plt.xlabel("Stability Score (0–1)")
plt.ylabel("Average Absolute Coefficient")
plt.title("Feature Stability vs. Importance")
plt.grid(True)
plt.show()

🔎 What You’ll See

  • X‑axis: Stability score (0 = never selected, 1 = always selected).

  • Y‑axis: Average absolute coefficient (importance).

  • Points: Each feature plotted by stability vs. importance.

  • Top‑right quadrant: Features that are both stable and important → most reliable predictors.

  • Annotations: Labels for the top 5 most important features.

Takeaway: This scatter plot makes it easy to distinguish:

  • Robust features (high stability + high importance).

  • Fragile features (low stability, even if sometimes important).

  • Noise features (low importance and low stability).

Would you like me to also show you how to track stability across multiple random resamples of the dataset (like bootstrapping), so you can measure feature reliability beyond just one training run?

yes

Perfect — let’s take stability analysis one step further by tracking feature stability across multiple random resamples of the dataset (bootstrapping). This way you’ll see not just how stable features are across methods, but also how reliable they remain under different training samples.

Python Snippet: Bootstrapped Stability Analysis

python
import numpy as np
import pandas as pd
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=20, noise=0.1, random_state=42)

# Parameters
n_bootstraps = 50
alphas = np.logspace(-2, 2, 50)

# Storage
ridge_selected = np.zeros((n_bootstraps, X.shape[1]))
lasso_selected = np.zeros((n_bootstraps, X.shape[1]))
elastic_selected = np.zeros((n_bootstraps, X.shape[1]))

# Bootstrapping loop
rng = np.random.default_rng(42)
for b in range(n_bootstraps):
    # Resample indices
    sample_idx = rng.choice(np.arange(X.shape[0]), size=X.shape[0], replace=True)
    X_res, y_res = X[sample_idx], y[sample_idx]
    
    # Fit models
    ridge_cv = RidgeCV(alphas=alphas, cv=5).fit(X_res, y_res)
    lasso_cv = LassoCV(alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    elastic_cv = ElasticNetCV(l1_ratio=[0.1,0.5,0.7,0.9,1.0], alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    
    # Record non-zero coefficients (selection indicator)
    ridge_selected[b] = (ridge_cv.coef_ != 0).astype(int)
    lasso_selected[b] = (lasso_cv.coef_ != 0).astype(int)
    elastic_selected[b] = (elastic_cv.coef_ != 0).astype(int)

# Compute stability scores across bootstraps
ridge_stability = ridge_selected.mean(axis=0)
lasso_stability = lasso_selected.mean(axis=0)
elastic_stability = elastic_selected.mean(axis=0)

# Combine into DataFrame
stability_df = pd.DataFrame({
    "Ridge Stability": ridge_stability,
    "Lasso Stability": lasso_stability,
    "Elastic Net Stability": elastic_stability,
    "Average Stability": (ridge_stability + lasso_stability + elastic_stability)/3
}, index=[f"Feature {i}" for i in range(X.shape[1])])

# Rank features by average stability
ranked_stability = stability_df.sort_values("Average Stability", ascending=False)

# Display top features
print(ranked_stability.head(10))

🔎 What You’ll See

  • Stability scores per feature across Ridge, Lasso, and Elastic Net, averaged over 50 bootstrap resamples.

  • Features with high average stability are consistently selected across methods and resamples.

  • Features with low stability are sensitive to sampling noise or regularization choice.

  • A ranked table of the top 10 most stable features.

Takeaway: Bootstrapping adds robustness: instead of relying on one dataset split, you measure how reliably features are selected across many resamples. This highlights truly dependable predictors versus those that fluctuate.

Would you like me to also show you how to plot these stability scores as a grouped bar chart (Ridge vs Lasso vs Elastic Net per feature), so you can visually compare method-specific stability side by side?

yes

Here’s how you can plot bootstrapped stability scores as a grouped bar chart, so you can visually compare Ridge, Lasso, and Elastic Net stability side by side for each feature.

Python Snippet: Grouped Bar Chart of Stability Scores

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=20, noise=0.1, random_state=42)

# Parameters
n_bootstraps = 50
alphas = np.logspace(-2, 2, 50)

ridge_selected = np.zeros((n_bootstraps, X.shape[1]))
lasso_selected = np.zeros((n_bootstraps, X.shape[1]))
elastic_selected = np.zeros((n_bootstraps, X.shape[1]))

rng = np.random.default_rng(42)
for b in range(n_bootstraps):
    sample_idx = rng.choice(np.arange(X.shape[0]), size=X.shape[0], replace=True)
    X_res, y_res = X[sample_idx], y[sample_idx]
    
    ridge_cv = RidgeCV(alphas=alphas, cv=5).fit(X_res, y_res)
    lasso_cv = LassoCV(alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    elastic_cv = ElasticNetCV(l1_ratio=[0.1,0.5,0.7,0.9,1.0], alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    
    ridge_selected[b] = (ridge_cv.coef_ != 0).astype(int)
    lasso_selected[b] = (lasso_cv.coef_ != 0).astype(int)
    elastic_selected[b] = (elastic_cv.coef_ != 0).astype(int)

ridge_stability = ridge_selected.mean(axis=0)
lasso_stability = lasso_selected.mean(axis=0)
elastic_stability = elastic_selected.mean(axis=0)

stability_df = pd.DataFrame({
    "Ridge": ridge_stability,
    "Lasso": lasso_stability,
    "Elastic Net": elastic_stability
}, index=[f"Feature {i}" for i in range(X.shape[1])])

# Plot grouped bar chart for top 10 features
top_features = stability_df.mean(axis=1).sort_values(ascending=False).head(10).index
stability_top = stability_df.loc[top_features]

stability_top.plot(kind="bar", figsize=(12,6))
plt.ylabel("Stability Score (0–1)")
plt.title("Bootstrapped Feature Stability Across Methods")
plt.grid(axis="y")
plt.show()

🔎 What You’ll See

  • A grouped bar chart for the top 10 most stable features.

  • Each feature has three bars: Ridge (blue), Lasso (red), Elastic Net (green).

  • Bar height = stability score (fraction of bootstraps where the feature was selected).

  • Easy to compare method‑specific stability side by side.

Takeaway:

  • Features with consistently high bars across all methods are the most robust predictors.

  • Features with high Ridge but low Lasso stability are kept by Ridge but dropped by Lasso.

  • Elastic Net often sits between Ridge and Lasso, balancing stability and sparsity.

Would you like me to also show you how to summarize these stability scores in a radar chart, so you can compare method profiles across all features in one circular visualization?

yes

Here’s how you can summarize bootstrapped stability scores in a radar chart, so you can compare Ridge, Lasso, and Elastic Net profiles across all features in one circular visualization. This makes it easy to see method‑specific stability patterns at a glance.

Python Snippet: Radar Chart of Stability Scores

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=10, noise=0.1, random_state=42)

# Parameters
n_bootstraps = 30
alphas = np.logspace(-2, 2, 50)

ridge_selected = np.zeros((n_bootstraps, X.shape[1]))
lasso_selected = np.zeros((n_bootstraps, X.shape[1]))
elastic_selected = np.zeros((n_bootstraps, X.shape[1]))

rng = np.random.default_rng(42)
for b in range(n_bootstraps):
    sample_idx = rng.choice(np.arange(X.shape[0]), size=X.shape[0], replace=True)
    X_res, y_res = X[sample_idx], y[sample_idx]
    
    ridge_cv = RidgeCV(alphas=alphas, cv=5).fit(X_res, y_res)
    lasso_cv = LassoCV(alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    elastic_cv = ElasticNetCV(l1_ratio=[0.1,0.5,0.7,0.9,1.0], alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    
    ridge_selected[b] = (ridge_cv.coef_ != 0).astype(int)
    lasso_selected[b] = (lasso_cv.coef_ != 0).astype(int)
    elastic_selected[b] = (elastic_cv.coef_ != 0).astype(int)

ridge_stability = ridge_selected.mean(axis=0)
lasso_stability = lasso_selected.mean(axis=0)
elastic_stability = elastic_selected.mean(axis=0)

features = [f"Feature {i}" for i in range(X.shape[1])]
angles = np.linspace(0, 2*np.pi, len(features), endpoint=False).tolist()
angles += angles[:1]  # close the loop

# Prepare values for radar chart
ridge_vals = ridge_stability.tolist()
lasso_vals = lasso_stability.tolist()
elastic_vals = elastic_stability.tolist()

ridge_vals += ridge_vals[:1]
lasso_vals += lasso_vals[:1]
elastic_vals += elastic_vals[:1]

# Plot radar chart
fig, ax = plt.subplots(figsize=(8,8), subplot_kw=dict(polar=True))

ax.plot(angles, ridge_vals, color="blue", linewidth=2, label="Ridge")
ax.fill(angles, ridge_vals, color="blue", alpha=0.25)

ax.plot(angles, lasso_vals, color="red", linewidth=2, label="Lasso")
ax.fill(angles, lasso_vals, color="red", alpha=0.25)

ax.plot(angles, elastic_vals, color="green", linewidth=2, label="Elastic Net")
ax.fill(angles, elastic_vals, color="green", alpha=0.25)

ax.set_xticks(angles[:-1])
ax.set_xticklabels(features)
ax.set_yticks([0.25, 0.5, 0.75, 1.0])
ax.set_yticklabels(["0.25","0.5","0.75","1.0"])
ax.set_ylim(0,1)

plt.title("Bootstrapped Feature Stability Radar Chart", size=14)
plt.legend(loc="upper right", bbox_to_anchor=(1.2, 1.1))
plt.show()

🔎 What You’ll See

  • A radar chart with axes = features and values = stability scores (0–1).

  • Three overlapping profiles:

    • Ridge (blue): usually high stability across all features.

    • Lasso (red): sparse profile, only stable for a subset of features.

    • Elastic Net (green): intermediate, balancing Ridge’s broad stability and Lasso’s sparsity.

Takeaway: This visualization shows method‑specific stability patterns in one circular view. You’ll quickly spot:

  • Features consistently stable across all methods.

  • Features only stable under Ridge or Elastic Net.

  • Features dropped by Lasso.

Would you like me to also show you how to aggregate these radar profiles into a single “consensus stability score” per feature, so you can rank features by their overall reliability across methods and resamples?

yes

Here’s how you can aggregate the radar profiles into a single “consensus stability score” per feature, so you can rank features by their overall reliability across methods and resamples. This gives you one unified measure of how dependable each predictor is.

Python Snippet: Consensus Stability Score

python
import numpy as np
import pandas as pd
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=10, noise=0.1, random_state=42)

# Parameters
n_bootstraps = 30
alphas = np.logspace(-2, 2, 50)

ridge_selected = np.zeros((n_bootstraps, X.shape[1]))
lasso_selected = np.zeros((n_bootstraps, X.shape[1]))
elastic_selected = np.zeros((n_bootstraps, X.shape[1]))

rng = np.random.default_rng(42)
for b in range(n_bootstraps):
    sample_idx = rng.choice(np.arange(X.shape[0]), size=X.shape[0], replace=True)
    X_res, y_res = X[sample_idx], y[sample_idx]
    
    ridge_cv = RidgeCV(alphas=alphas, cv=5).fit(X_res, y_res)
    lasso_cv = LassoCV(alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    elastic_cv = ElasticNetCV(l1_ratio=[0.1,0.5,0.7,0.9,1.0], alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    
    ridge_selected[b] = (ridge_cv.coef_ != 0).astype(int)
    lasso_selected[b] = (lasso_cv.coef_ != 0).astype(int)
    elastic_selected[b] = (elastic_cv.coef_ != 0).astype(int)

# Compute stability scores across bootstraps
ridge_stability = ridge_selected.mean(axis=0)
lasso_stability = lasso_selected.mean(axis=0)
elastic_stability = elastic_selected.mean(axis=0)

# Consensus stability = average across methods
consensus_stability = (ridge_stability + lasso_stability + elastic_stability) / 3

# Build DataFrame
consensus_df = pd.DataFrame({
    "Ridge Stability": ridge_stability,
    "Lasso Stability": lasso_stability,
    "Elastic Net Stability": elastic_stability,
    "Consensus Stability": consensus_stability
}, index=[f"Feature {i}" for i in range(X.shape[1])])

# Rank features by consensus stability
ranked_consensus = consensus_df.sort_values("Consensus Stability", ascending=False)

# Display top features
print(ranked_consensus.head(10))

🔎 What You’ll See

  • A table with Ridge, Lasso, Elastic Net stability scores per feature.

  • A Consensus Stability column = average across methods.

  • A ranked list of the most reliable features overall.

  • Features with consensus stability close to 1.0 are consistently selected across methods and resamples.

Takeaway: This consensus score gives you a single, unified measure of reliability.

  • High consensus = robust predictors across methods and resamples.

  • Low consensus = unstable predictors, sensitive to sampling or regularization choice.

Would you like me to also show you how to plot these consensus scores as a horizontal bar chart, so you can instantly see the most reliable features ranked from strongest to weakest?

yes

Here’s how you can plot consensus stability scores as a horizontal bar chart, so you can instantly see the most reliable features ranked from strongest to weakest.

Python Snippet: Consensus Stability Horizontal Bar Chart

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=10, noise=0.1, random_state=42)

# Parameters
n_bootstraps = 30
alphas = np.logspace(-2, 2, 50)

ridge_selected = np.zeros((n_bootstraps, X.shape[1]))
lasso_selected = np.zeros((n_bootstraps, X.shape[1]))
elastic_selected = np.zeros((n_bootstraps, X.shape[1]))

rng = np.random.default_rng(42)
for b in range(n_bootstraps):
    sample_idx = rng.choice(np.arange(X.shape[0]), size=X.shape[0], replace=True)
    X_res, y_res = X[sample_idx], y[sample_idx]
    
    ridge_cv = RidgeCV(alphas=alphas, cv=5).fit(X_res, y_res)
    lasso_cv = LassoCV(alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    elastic_cv = ElasticNetCV(l1_ratio=[0.1,0.5,0.7,0.9,1.0], alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    
    ridge_selected[b] = (ridge_cv.coef_ != 0).astype(int)
    lasso_selected[b] = (lasso_cv.coef_ != 0).astype(int)
    elastic_selected[b] = (elastic_cv.coef_ != 0).astype(int)

ridge_stability = ridge_selected.mean(axis=0)
lasso_stability = lasso_selected.mean(axis=0)
elastic_stability = elastic_selected.mean(axis=0)

consensus_stability = (ridge_stability + lasso_stability + elastic_stability) / 3

consensus_df = pd.DataFrame({
    "Consensus Stability": consensus_stability
}, index=[f"Feature {i}" for i in range(X.shape[1])])

# Sort features by consensus stability
ranked_consensus = consensus_df.sort_values("Consensus Stability", ascending=True)

# Plot horizontal bar chart
plt.figure(figsize=(10,6))
ranked_consensus["Consensus Stability"].plot(kind="barh", color="teal")
plt.xlabel("Consensus Stability (0–1)")
plt.title("Feature Consensus Stability Ranking")
plt.grid(axis="x")
plt.show()

🔎 What You’ll See

  • A horizontal bar chart with features ranked from lowest to highest consensus stability.

  • Features at the top of the chart (longest bars) are the most reliable predictors across Ridge, Lasso, Elastic Net, and bootstraps.

  • Features at the bottom are unstable, often dropped or inconsistently selected.

Takeaway: This visualization gives you a clear, ranked overview of feature reliability.

  • Long bars = strong, stable predictors.

  • Short bars = weak, unstable predictors. It’s a quick way to decide which features to trust most in your model.

Would you like me to also extend this into a two‑dimensional plot (importance vs. consensus stability), so you can see not just reliability but also how influential each feature is?

yes

Here’s how you can extend the analysis into a two‑dimensional plot (importance vs. consensus stability). This way you’ll see not only how reliable each feature is across methods and resamples, but also how influential it is in terms of average coefficient magnitude.

Python Snippet: Importance vs. Consensus Stability Scatter Plot

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=10, noise=0.1, random_state=42)

# Parameters
n_bootstraps = 30
alphas = np.logspace(-2, 2, 50)

ridge_selected = np.zeros((n_bootstraps, X.shape[1]))
lasso_selected = np.zeros((n_bootstraps, X.shape[1]))
elastic_selected = np.zeros((n_bootstraps, X.shape[1]))

rng = np.random.default_rng(42)
for b in range(n_bootstraps):
    sample_idx = rng.choice(np.arange(X.shape[0]), size=X.shape[0], replace=True)
    X_res, y_res = X[sample_idx], y[sample_idx]
    
    ridge_cv = RidgeCV(alphas=alphas, cv=5).fit(X_res, y_res)
    lasso_cv = LassoCV(alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    elastic_cv = ElasticNetCV(l1_ratio=[0.1,0.5,0.7,0.9,1.0], alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    
    ridge_selected[b] = (ridge_cv.coef_ != 0).astype(int)
    lasso_selected[b] = (lasso_cv.coef_ != 0).astype(int)
    elastic_selected[b] = (elastic_cv.coef_ != 0).astype(int)

ridge_stability = ridge_selected.mean(axis=0)
lasso_stability = lasso_selected.mean(axis=0)
elastic_stability = elastic_selected.mean(axis=0)

consensus_stability = (ridge_stability + lasso_stability + elastic_stability) / 3

# Average importance across methods (absolute coefficients)
avg_importance = (np.abs(ridge_cv.coef_) + np.abs(lasso_cv.coef_) + np.abs(elastic_cv.coef_)) / 3

# Build DataFrame
consensus_df = pd.DataFrame({
    "Consensus Stability": consensus_stability,
    "Average Importance": avg_importance
}, index=[f"Feature {i}" for i in range(X.shape[1])])

# Scatter plot
plt.figure(figsize=(10,6))
plt.scatter(consensus_df["Consensus Stability"], consensus_df["Average Importance"], color="darkorange")

# Annotate top features
for feature, row in consensus_df.sort_values("Average Importance", ascending=False).head(5).iterrows():
    plt.text(row["Consensus Stability"]+0.02, row["Average Importance"], feature, fontsize=9)

plt.xlabel("Consensus Stability (0–1)")
plt.ylabel("Average Importance (Absolute Coefficient)")
plt.title("Feature Importance vs. Consensus Stability")
plt.grid(True)
plt.show()

🔎 What You’ll See

  • X‑axis: Consensus stability (0–1).

  • Y‑axis: Average importance (absolute coefficient magnitude).

  • Points: Each feature plotted by reliability vs. influence.

  • Top‑right quadrant: Features that are both highly stable and highly important → your most trustworthy predictors.

  • Annotations: Labels for the top 5 most important features.

Takeaway: This visualization combines robustness (stability across methods and resamples) with influence (importance in the model).

  • Top‑right = strong, reliable predictors.

  • Bottom‑right = stable but weak predictors.

  • Top‑left = influential but unstable predictors (sensitive to method choice).

  • Bottom‑left = weak and unstable (likely noise).

Would you like me to also show you how to color‑code the scatter plot by method preference (e.g., which method gives the highest coefficient for each feature), so you can see not just stability and importance but also which regularization strategy favors each feature?

yes

Here’s how you can plot importance vs. consensus stability with color‑coding by method preference, so you see not just reliability and influence but also which regularization strategy tends to favor each feature.

Python Snippet: Importance vs. Consensus Stability with Method Preference Colors

python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=10, noise=0.1, random_state=42)

# Parameters
n_bootstraps = 30
alphas = np.logspace(-2, 2, 50)

ridge_selected = np.zeros((n_bootstraps, X.shape[1]))
lasso_selected = np.zeros((n_bootstraps, X.shape[1]))
elastic_selected = np.zeros((n_bootstraps, X.shape[1]))

rng = np.random.default_rng(42)
for b in range(n_bootstraps):
    sample_idx = rng.choice(np.arange(X.shape[0]), size=X.shape[0], replace=True)
    X_res, y_res = X[sample_idx], y[sample_idx]
    
    ridge_cv = RidgeCV(alphas=alphas, cv=5).fit(X_res, y_res)
    lasso_cv = LassoCV(alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    elastic_cv = ElasticNetCV(l1_ratio=[0.1,0.5,0.7,0.9,1.0], alphas=alphas, cv=5, random_state=42).fit(X_res, y_res)
    
    ridge_selected[b] = (ridge_cv.coef_ != 0).astype(int)
    lasso_selected[b] = (lasso_cv.coef_ != 0).astype(int)
    elastic_selected[b] = (elastic_cv.coef_ != 0).astype(int)

ridge_stability = ridge_selected.mean(axis=0)
lasso_stability = lasso_selected.mean(axis=0)
elastic_stability = elastic_selected.mean(axis=0)

consensus_stability = (ridge_stability + lasso_stability + elastic_stability) / 3
avg_importance = (np.abs(ridge_cv.coef_) + np.abs(lasso_cv.coef_) + np.abs(elastic_cv.coef_)) / 3

# Determine method preference (which method gives highest absolute coefficient)
method_pref = []
for i in range(X.shape[1]):
    coefs = {
        "Ridge": np.abs(ridge_cv.coef_[i]),
        "Lasso": np.abs(lasso_cv.coef_[i]),
        "Elastic Net": np.abs(elastic_cv.coef_[i])
    }
    best_method = max(coefs, key=coefs.get)
    method_pref.append(best_method)

# Build DataFrame
consensus_df = pd.DataFrame({
    "Consensus Stability": consensus_stability,
    "Average Importance": avg_importance,
    "Method Preference": method_pref
}, index=[f"Feature {i}" for i in range(X.shape[1])])

# Color map by method preference
color_map = {"Ridge":"blue", "Lasso":"red", "Elastic Net":"green"}
colors = [color_map[m] for m in consensus_df["Method Preference"]]

# Scatter plot
plt.figure(figsize=(10,6))
plt.scatter(consensus_df["Consensus Stability"], consensus_df["Average Importance"], c=colors)

# Annotate top features
for feature, row in consensus_df.sort_values("Average Importance", ascending=False).head(5).iterrows():
    plt.text(row["Consensus Stability"]+0.02, row["Average Importance"], feature, fontsize=9)

plt.xlabel("Consensus Stability (0–1)")
plt.ylabel("Average Importance (Absolute Coefficient)")
plt.title("Feature Importance vs. Consensus Stability (Color = Method Preference)")
plt.grid(True)
plt.show()

🔎 What You’ll See

  • X‑axis: Consensus stability (0–1).

  • Y‑axis: Average importance (absolute coefficient magnitude).

  • Point colors:

    • Blue = Ridge favors the feature most.

    • Red = Lasso favors the feature most.

    • Green = Elastic Net favors the feature most.

  • Top‑right quadrant: Strong, reliable predictors, with color showing which method emphasizes them most.

Takeaway: This visualization adds a third dimension: method preference.

  • You can now see which features are robust and influential, and also whether Ridge, Lasso, or Elastic Net gives them the strongest weight.

  • It’s a powerful way to interpret not just what matters, but how different regularization strategies view each feature.

Would you like me to also show you how to quantify overlap between methods (e.g., Jaccard similarity of selected features), so you can measure how much Ridge, Lasso, and Elastic Net agree on feature selection?

Here’s how you can quantify overlap between Ridge, Lasso, and Elastic Net feature selections using Jaccard similarity. This measures how much the sets of selected features agree across methods — giving you a numeric view of consensus beyond stability scores.

Python Snippet: Jaccard Similarity of Selected Features

python
import numpy as np
import pandas as pd
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=10, noise=0.1, random_state=42)

alphas = np.logspace(-2, 2, 50)

# Fit models
ridge_cv = RidgeCV(alphas=alphas, cv=5).fit(X, y)
lasso_cv = LassoCV(alphas=alphas, cv=5, random_state=42).fit(X, y)
elastic_cv = ElasticNetCV(l1_ratio=[0.1,0.5,0.7,0.9,1.0], alphas=alphas, cv=5, random_state=42).fit(X, y)

# Selected features (non-zero coefficients)
ridge_set = set(np.where(ridge_cv.coef_ != 0)[0])
lasso_set = set(np.where(lasso_cv.coef_ != 0)[0])
elastic_set = set(np.where(elastic_cv.coef_ != 0)[0])

# Jaccard similarity function
def jaccard(a, b):
    return len(a & b) / len(a | b) if len(a | b) > 0 else 0

# Compute pairwise overlaps
jaccard_scores = {
    "Ridge vs Lasso": jaccard(ridge_set, lasso_set),
    "Ridge vs Elastic Net": jaccard(ridge_set, elastic_set),
    "Lasso vs Elastic Net": jaccard(lasso_set, elastic_set)
}

# Build DataFrame
jaccard_df = pd.DataFrame.from_dict(jaccard_scores, orient="index", columns=["Jaccard Similarity"])

print(jaccard_df)

🔎 What You’ll See

  • Jaccard Similarity values (0–1):

    • 1.0 → perfect overlap (identical selected features).

    • 0.0 → no overlap at all.

  • A small table showing overlap between Ridge vs Lasso, Ridge vs Elastic Net, and Lasso vs Elastic Net.

Takeaway:

  • Ridge usually overlaps more with Elastic Net (since both retain correlated features).

  • Lasso overlaps less, because it enforces sparsity and drops many predictors.

  • This metric gives you a quantitative measure of agreement between methods, complementing stability and importance analyses.

Would you like me to also show you how to visualize these overlaps with a Venn diagram, so you can see the shared and unique feature selections across Ridge, Lasso, and Elastic Net?

yes

Here’s how you can visualize overlap between Ridge, Lasso, and Elastic Net feature selections with a Venn diagram, so you see shared and unique predictors across methods.

Python Snippet: Venn Diagram of Feature Selection Overlap

python
import numpy as np
import matplotlib.pyplot as plt
from matplotlib_venn import venn3
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.datasets import make_regression

# Synthetic dataset
X, y = make_regression(n_samples=200, n_features=10, noise=0.1, random_state=42)

alphas = np.logspace(-2, 2, 50)

# Fit models
ridge_cv = RidgeCV(alphas=alphas, cv=5).fit(X, y)
lasso_cv = LassoCV(alphas=alphas, cv=5, random_state=42).fit(X, y)
elastic_cv = ElasticNetCV(l1_ratio=[0.1,0.5,0.7,0.9,1.0], alphas=alphas, cv=5, random_state=42).fit(X, y)

# Selected features (non-zero coefficients)
ridge_set = set(np.where(ridge_cv.coef_ != 0)[0])
lasso_set = set(np.where(lasso_cv.coef_ != 0)[0])
elastic_set = set(np.where(elastic_cv.coef_ != 0)[0])

# Plot Venn diagram
plt.figure(figsize=(8,6))
venn3([ridge_set, lasso_set, elastic_set],
      set_labels=("Ridge", "Lasso", "Elastic Net"),
      set_colors=("blue", "red", "green"),
      alpha=0.4)

plt.title("Feature Selection Overlap Across Ridge, Lasso, Elastic Net")
plt.show()

🔎 What You’ll See

  • A Venn diagram with three circles:

    • Blue (Ridge) → features retained by Ridge.

    • Red (Lasso) → features retained by Lasso.

    • Green (Elastic Net) → features retained by Elastic Net.

  • Overlapping regions show shared features selected by multiple methods.

  • Non‑overlapping regions show unique selections.

Takeaway:

  • Ridge typically retains most features, so its circle is large.

  • Lasso enforces sparsity, so its circle is smaller, often overlapping partially with Elastic Net.

  • Elastic Net overlaps with both Ridge and Lasso, balancing their tendencies.

  • The overlaps reveal consensus predictors that all methods agree on, versus method‑specific predictors.

Would you like me to also extend this into a weighted Venn diagram (proportional to coefficient magnitude), so the overlaps reflect not just selection but also relative importance?


مدونة البعداني

اجعل مدونتك افضل مدونة مع التصاميم المجانية و الاضافات الجديدة و الدعم الفني المجاني فقط على مدونة جلال البعداني ,,,,,

0 comments