https://piazza.com/class_profile/get_resource/ln18bjs43q41tr/ln3yba9upossz
https://piazza.com/class_profile/get_resource/ln18bjs43q41tr/lne2g4wvqtldw
https://piazza.com/class_profile/get_resource/ln18bjs43q41tr/lno5od2ut6c73
Distance Functions Admissible
- Satisfying Triangle Inequality: try to simplify

Multivariate Guassian

- Spherical: Sigma is Diagonal; sigma^2 * I_d = diagonal all sigma^2
- Diagonal: Sigma diagonal but entries can differ
Why Lasso Promotes Sparcity
- Ll(w, b) = MSE Loss + lamda ||w||
- Absolute Penalty: The L1 penalty, which is the sum of the absolute values of the coefficients, has a unique property. As we increase λ, some of the coefficients will be exactly estimated as zero, unlike the Ridge regression (L2 penalty) where coefficients are shrunk towards zero but not exactly to zero.
- Corner Solution: In a geometric sense, for a given penalty value, the �1L1 penalty constrains the estimates of the coefficients to lie in a diamond (in 2D), rhombus (in 3D), or a higher-dimensional equivalent (hyper-rhombus). The corners of these shapes correspond to sparse solutions (where some coefficients are exactly zero). When fitting the model, it's often easier for the solution to lie on one of these corners, especially when �λ is sufficiently large.
- Continuous Shrinkage: As �λ increases, the Lasso continuously pushes less important variables towards zero, and at a certain value of �λ, it sets those coefficients exactly to zero.
Why Ridge does not necessarily promote