Likelihood, MLE, MAP

Main Idea: measures how well a statistical model explains the observed data.

$$ \text{for } D = {x_1, ..., x_n} \text{ given parameter } \theta\\

\text{likelihood: } L(\theta) = P(D | \theta) $$

aka the probability of observing the data D given the parameter

Main Idea: finds the parameter that maximizes the probability of observing the data

$$ L(\theta) = \prod_i P(x_i|\theta) $$

$$ log(L(\theta)) = \sum_i P(x_i|\theta) $$

$$ \text{arg max }{\theta} \text{ } log(L(\theta))\\ = \text{arg max }{\theta}\sum_i P(x_i|\theta) $$

Main Idea: incorporate prior belief in to estimation, recall Bayes Theorem:

$$ P(\theta | D) = \frac{P(D | \theta) P(\theta)}{P(D)} $$

Maximizing the prior:

$$ \hat{\theta}{\text{MAP}} = \arg\max\theta P(\theta | D) $$

$$ \text{Bayes Theorem (P(D) constant due to maximizing w/ respect to } \theta \text{)}\\

\hat{\theta}{\text{MAP}} = \arg\max\theta P(D | \theta) P(\theta) $$