Activation Functions

Chain Rule

Main Idea: Use the chain rule to compute gradients in the context of many features

Screenshot 2025-02-10 at 1.24.00 PM.png

Backpropagation

Screenshot 2025-02-10 at 1.27.56 PM.png

In Practice

We should consider how big and how many layers there are, and what activation functions are used

Example: Linear Layer

Screenshot 2025-02-10 at 1.36.35 PM.png

Example: Sigmoid

Screenshot 2025-02-10 at 1.37.14 PM.png

1/16 - Tricks of the Trade

1/18 - Optimization and Training