CMPUT 656 - Bandit Algorithms

Overview

Decision making in the face of uncertainty is a significant challenge in machine learning. Which drugs should a patient receive? How should I allocate my study time between courses? Which version of a website will return the most revenue? What move should be considered next when playing chess/go? All of these questions can be expressed in the multi-armed bandit framework where a learning agent sequentially takes actions, observes rewards and aims to maximize the total reward over a period of time. The framework is now very popular, used in practice by big companies, and growing fast. The course is based on a new freely available book co-authored by the instructor and will cover topics such as stochastic, adversarial finite-armed bandits, proving optimality of bandit algorithms, linear bandits, and even some excursion to the land of Markovian Decision Processes.

Objectives

The focus of the course will be on understanding the statistical ideas, mathematics and implementation details for current state-of-the-art algorithms.

Course Work

  • Assignments
  • Midterm
  • Final Exams

Note: Both the midterm and final are take home exams. Students are required to fill in a pledge that they work on these alone with no help from others.

Related Research Areas

  • Artificial Intelligence
  • Machine Learning
  • Reinforcement Learning