Early Stopping
Dropout
L2 regularization
Random Weight initialization
If all weights in a network are initialized to the same value (e.g., 0), neurons in each layer will learn identical features, preventing the model from converging effectively. Random initialization ensures that each neuron learns unique features.
Uniform or Normal Distribution: Weights are drawn randomly from a uniform or normal distribution (e.g., U(−0.1,0.1) or N(0,0.01)).
Xavier Initialization (Glorot): Ensures that the variance of activations is consistent across layers.
$$ \sim \mathcal{U}\left(-\frac{\sqrt{6}}{\sqrt{n_{in} + n_{out}}}, \frac{\sqrt{6}}{\sqrt{n_{in} + n_{out}}}\right)\\
\text{where $n_{in}$ and $n_{out}$ are the number of inputs and outputs for the layer.} $$
Other Optimizers