CMPUT 609 - Reinforcement Learning II


This course is an advanced treatment of the reinforcement learning approach to artificial intelligence, emphasizing the second and third parts of the second edition of the textbook Reinforcement Learning: An Introduction, by the instructor, Rich Sutton, and Andrew Barto. Students should have covered Part I of the textbook either in a previous course (such as CMPUT 366) or in extensive self study. Also required is comfort with the mathematics of probability distributions, expectations, linear algebra, and elementary calculus.

Reinforcement learning concerns the design of complete agents interacting with stochastic, incompletely-known environments, adapting ideas from machine learning, operations research, and control theory as well as from psychology and neuroscience to produce some strikingly successful engineering applications, such as AlphaGo. The focus is on algorithms for learning what actions to take, and when to take them, so as to optimize long-term performance. This may involve sacrificing immediate reward to obtain greater reward in the long-term or just to obtain more information about the environment.

The course takes a deeper look at the foundations of Markov decision processes, temporal difference learning, multi-step learning, function approximation, off-policy training, eligibility traces, policy gradient methods, general value functions, planning, and the concept of state. The focus is on the development of intuition relating the mathematical theory of reinforcement learning to the ambitious goals of artificial intelligence.


  • Thoroughly understand the foundations of the reinforcement learning approach to artificial intelligence
  • Be well prepared to conduct research in reinforcement learning
  • Apply reinforcement learning ideas in novel ways
  • Ability to appreciate and critically assess claims made about reinforcement learning

Course Work

  • Projects
  • Final Exam
  • Research Diary