## Algorithms for Reinforcement Learning Errata for the printed book

August 7, 2010*

*Last update: May 18, 2013

### Contents

Page numbers refer to the printed copy. The online version (the “draft”) is up-to-date. Thanks to my PhD student, Gabor Bartok, who has found many of these errors.

• p. xi. Section 2) should be Section 2 (no closing parenthesis)
• p. 1. The dot should be in between the bars in the definition of the infinity norm, not on the top. That is, ∥⋅∥ is the intended form and not ||. Also, in “which, if θ, which” the part “which, if θ” should be deleted.
• p.2. The footnote from p.5 explaining the meaning of “almost surely” should be moved here.
• p.5. In the example on gambling the personal pronoun “his” should be replaced by “her”.
• p.9. In Eq. (1.14) on the right-hand side of the equation Q(y,π(x)) should be Q(y,π(y)).
• p.12. In footnote 1, add “if” before “it”.
• p. 21. line 1: “then” should be “than”.
• p. 22. The text “goal is to approximate the value function V underlying ” should be deleted.
• p. 23. Delete “be” from “is no longer be guaranteed”. After θ(λ) in the middle of page delete “.”. The phrase “using V θ” should be “ using the chosen features φ”.
• p. 25. “some methods using which” should be “some methods that avoid”
• p. 32. The word “complicate” (in the middle of page) should be “complicated”.
• p. 40. The text “Gittins (1989) has shown” should be “Gittins (1989) showed”.
• p. 43. The 4th displayed equation and the text surrounding it should be deleted. This is the equation that says that RT UCRL2(δ) = O(D2||2|| log(T∕δ)∕ε+εT). This equation holds (under the cited conditions), but it does not lead itself to a logarithmic regret bound.
• p. 47. On line 4 of the 1st paragraph of Section 3.3., “optimalas” should be “optimal as”. On the same page, on line 3, after Eq. (3.1): “, Algorithm 12 the pseudocode of Q-learning.” should start with a full stop and is missing the word “shows”. So, the text should be “. Algorithm 12 shows the pseudocode of Q-learning.”.
• p. 48. Section 3.2 is mentioned twice in the same sentence (around the middle of the page). The second occurence should be deleted.
• p. 56, Algorithm 16, line 7. The correct update equation is b b + Rt+1 z [Tom Schaul, Idsia].
• p. 58. The definition of regret should be RT = * - T [Hamid Reza Maei, Stanford].
• p. 65. The definition of norm is missing the so-called homogeneity condition: For any λ , v V , f(λv) = |λ|f(v). One the same page, in the bottom, “ norms” should be “ norm”.
• p. 66. “uniformly bounded” should be “bounded” (when mentioning a single function).
• p. 67. “Polish mathematicians” should be singular: “Polish mathematician”.
• p. 68. On the top of page, “Assume that T is a γ-contraction.” should go into the next line.
• p. 69. In the line preceding the definition of B(), “uniformly bounded” should be “bounded”.

For further information, visit http://www.ualberta.ca/~szepesva/RLBook.html.