8/1 - Learning Theory, Decision Trees, and Additional Optimization Methods

Fundamental tradeoff: bias vs. variance

Usually algorithms prefer consistency by default (why?)

Several ways to operationalize “simplicity”

Reduce the hypothesis space
- Assume more: e.g. independence assumptions, as in naïve Bayes
- Have fewer, better features / attributes: feature selection
- Other structural limitations (decision lists vs trees)

Regularization

Second Order

Momentum

Screen Shot 2023-08-01 at 3.15.58 PM.png

Adaptive Learning Rates

Key idea: different learning rates for each parameter