Idea: Can network learn without training, but instead input/output examples?
- add structured memory to a neural controller that it can read and write from
- essentially a RNN w/ a working memory
- end to end differentiable
Memory is a matrix of neurons
$$
r_t <- \sum_i w_t(i)M_t(i)
$$

write consists of erase, add


- have to write in two steps b/c no ‘overrite’ arithmetic method
Salience map
- decide where to sample from image
Recurrent attention model
- choose random loc at first (exploration, the exploit what it has learned)