7/18 - MDPs, Bellman Equations, Q-values, Policies
7/19 - RL, Bandit Problems, Regret
7/20 - RL: TD Learning, Q-learning, Policy Search