
-
sigmoid: [0, 1]
-
tangent: [-1, 1]

Output Layer: like multiclass logistic regression

- neural net w/ one layer is logistic regression
Complexity

- #params ≥ # edges
- 500 nodes → 1000 nodes: 500,500 parameters
Universal Approximation Theorem

- benefit of depth: you may need one layer that is really big, or many layers of moderate size
Neural Net Loss Function

- differs from logistic regression as it is not guaranteed convex