Linear Regression

minimizing the sums of the squares of the lengths of the data points to the fitted plane. We didn’t need to square the lengths; we can just use the absolute sum of the lengths
Let w be the (d+1) weight vector
Let X be the (n x d+1) design matrix of training points
Let y be the ground truth values

$$ \text{min}_w ||Xw - y||^2 $$

there is a linear relationship between dependent / independent variables
there is true independence in the independent variable
no mulitcollinearity; if there is, then variables correlate w/ each other. This ensures that there is no perfect linear relationship between the independent variables, leading to more accurate predictions.
normality: The distribution of the residuals should be bell-shaped and symmetrical. This ensures that the errors are normally distributed, as in a study on the distribution of incomes.

Adds L2-norm penalty to the least squares loss function.

It shrinks the coefficients toward zero but does not enforce exact zero, meaning all features remain in the model

$$ L_2(\mathbf{w}) = \sum_i(y_i-w^Tx_i) + \lambda \sum_{j=1}^{p} w_j^2 $$