Main Idea: train a network that encodes an image into some hidden state, and then
decodes that image as accurately as possible from that hidden state (what we use for downstream tasks)
- Representation learning: VAEs, contrastive learning
- Generation: GANs, VAEs, autoregressive models
Types
- Dimensionality: make the hidden state smaller than the input/output, so that the network must compress it
- Denoising: corrupt the input with noise, forcing the autoencoder to learn to distinguish noise from signal
- Sparsity: force the hidden state to be sparse (most entries are zero), so that the network must compress the input
- Probabilistic modeling: force the hidden state to agree with a prior distribution (this will be covered next time)
Sparsity and Denoising
This leads to the idea of sparse autoencoders: can we represent inputs in even smaller representations

A good decoder should also be able to fill in the blanks

Variational Autoencoder
https://cs182sp21.github.io/static/slides/lec-18.pdf