Review | Notion

Distance Functions Admissible

Untitled

Multivariate Guassian

Untitled

Why Lasso Promotes Sparcity

Absolute Penalty: The L1 penalty, which is the sum of the absolute values of the coefficients, has a unique property. As we increase λ, some of the coefficients will be exactly estimated as zero, unlike the Ridge regression (L2 penalty) where coefficients are shrunk towards zero but not exactly to zero.
Corner Solution: In a geometric sense, for a given penalty value, the �1L1 penalty constrains the estimates of the coefficients to lie in a diamond (in 2D), rhombus (in 3D), or a higher-dimensional equivalent (hyper-rhombus). The corners of these shapes correspond to sparse solutions (where some coefficients are exactly zero). When fitting the model, it's often easier for the solution to lie on one of these corners, especially when �λ is sufficiently large.
Continuous Shrinkage: As �λ increases, the Lasso continuously pushes less important variables towards zero, and at a certain value of �λ, it sets those coefficients exactly to zero.

Why Ridge does not necessarily promote