RSS + \lambda_1 \sum_{j=1}^{p} |\beta_j| + \lambda_2 \sum_{j=1}^{p} \beta_j^2
Where j ranges from 1 to p and \lambda \geq 0
\sum_{j=1}^{p} |\beta_j| is the L1 normalization term, which encourages sparsity in the coefficients
\sum_{j=1}^{p} \beta_j^2 is the L2 normalization term, which encourages smoothness in the coefficients by penalizing large values.
Combines L1 and L2 Penalties: Merges Ridge and Lasso advantages for multicollinearity and feature selection.
Optimizes Feature Selection: L1 part zeroes out insignificant coefficients; L2 part shrinks coefficients to manage multicollinearity.
Requires Parameter Tuning: Optimal \lambda_1 and \lambda_2 balance feature elimination and coefficient reduction.
Mitigates Overfitting: Adjusts bias-variance trade-off, reducing overfitting risk.
Iterative Optimization: No closed-form solution due to L1 penalty; relies on optimization methods.
Effective in High Dimensions: Suitable for datasets with more features than observations.
Balances Sparsity and Stability: Ensures model relevance and stability through L1 and L2 penalties.
Enhances Interpretability: Simplifies the model by keeping only relevant predictors, improving model interpretability.