Consistency vs Simplicity

Fundamental tradeoff: bias vs. variance

Usually algorithms prefer consistency by default (why?)

Several ways to operationalize “simplicity”

Regularization

Other Optimizers

Second Order

Momentum

Screen Shot 2023-08-01 at 3.15.58 PM.png

Adaptive Learning Rates

Key idea: different learning rates for each parameter