Derivation

$$ \text{min}_w ||Xw - y||^2 $$

Assumptions

Ridge Regression

Adds L2-norm penalty to the least squares loss function.

It shrinks the coefficients toward zero but does not enforce exact zero, meaning all features remain in the model

$$ L_2(\mathbf{w}) = \sum_i(y_i-w^Tx_i) + \lambda \sum_{j=1}^{p} w_j^2 $$