Algorithms for Reinforcement Learning
Errata for the printed book
Csaba Szepesvári
*Last update: May 18, 2013
Contents
Page numbers refer to the printed copy. The online version (the “draft”) is up-to-date. Thanks to
my PhD student, Gabor Bartok, who has found many of these errors.
- p. xi. Section 2) should be Section 2 (no closing parenthesis)
- p. 1. The dot should be in between the bars in the definition of the infinity norm, not
on the top. That is, ∥⋅∥∞ is the intended form and not ||
∞. Also, in “which, if θ,
which” the part “which, if θ” should be deleted.
- p.2. The footnote from p.5 explaining the meaning of “almost surely” should be moved
here.
- p.5. In the example on gambling the personal pronoun “his” should be replaced by
“her”.
- p.9. In Eq. (1.14) on the right-hand side of the equation Q(y,π(x)) should be Q(y,π(y)).
- p.12. In footnote 1, add “if” before “it”.
- p. 21. line 1: “then” should be “than”.
- p. 22. The text “goal is to approximate the value function V underlying
” should
be deleted.
- p. 23. Delete “be” from “is no longer be guaranteed”. After θ(λ) in the middle of page
delete “.”. The phrase “using V θ” should be “ using the chosen features φ”.
- p. 25. “some methods using which” should be “some methods that avoid”
- p. 32. The word “complicate” (in the middle of page) should be “complicated”.
- p. 40. The text “Gittins (1989) has shown” should be “Gittins (1989) showed”.
- p. 43. The 4th displayed equation and the text surrounding it should be deleted. This is
the equation that says that RT UCRL2(δ) = O(D2|
|2|
| log(T∕δ)∕ε+εT). This equation
holds (under the cited conditions), but it does not lead itself to a logarithmic regret
bound.
- p. 47. On line 4 of the 1st paragraph of Section 3.3., “optimalas” should be “optimal
as”. On the same page, on line 3, after Eq. (3.1): “, Algorithm 12 the pseudocode of
Q-learning.” should start with a full stop and is missing the word “shows”. So, the
text should be “. Algorithm 12 shows the pseudocode of Q-learning.”.
- p. 48. Section 3.2 is mentioned twice in the same sentence (around the middle of the
page). The second occurence should be deleted.
- p. 56, Algorithm 16, line 7. The correct update equation is b ← b + Rt+1 ⋅ z [Tom
Schaul, Idsia].
- p. 58. The definition of regret should be RT
= Tρ* -
T
[Hamid Reza Maei,
Stanford].
- p. 65. The definition of norm is missing the so-called homogeneity condition: For any
λ ∈ ℝ, v ∈ V , f(λv) = |λ|f(v). One the same page, in the bottom, “ℓ∞ norms” should
be “ℓ∞ norm”.
- p. 66. “uniformly bounded” should be “bounded” (when mentioning a single function).
- p. 67. “Polish mathematicians” should be singular: “Polish mathematician”.
- p. 68. On the top of page, “Assume that T is a γ-contraction.” should go into the next
line.
- p. 69. In the line preceding the definition of B(
), “uniformly bounded” should be
“bounded”.
For further information, visit http://www.ualberta.ca/~szepesva/RLBook.html.