L2: make weight smaller in proportion to size, big weights more

L1: make weight smaller at a constant rate, all weights the same

Rumelhart’s idea: eliminate small weights

Improve Generalization

SGD

Mini Batch

Whats wrong with all positive inputs?