Main Idea: train a network that encodes an image into some hidden state, and then decodes that image as accurately as possible from that hidden state (what we use for downstream tasks)

Types

Sparsity and Denoising

This leads to the idea of sparse autoencoders: can we represent inputs in even smaller representations

Screenshot 2025-02-10 at 2.19.21 PM.png

A good decoder should also be able to fill in the blanks

Screenshot 2025-02-10 at 2.19.44 PM.png

Variational Autoencoder

https://cs182sp21.github.io/static/slides/lec-18.pdf