Page numbers refer to the printed copy. The online version (the “draft”) is up-to-date. Thanks to my PhD student, Gabor Bartok, who has found many of these errors.

- p. xi. Section 2) should be Section 2 (no closing parenthesis)
- p. 1. The dot should be in between the bars in the definition of the infinity norm, not
on the top. That is, ∥⋅∥
_{∞}is the intended form and not ||_{∞}. Also, in “which, if θ, which” the part “which, if θ” should be deleted. - p.2. The footnote from p.5 explaining the meaning of “almost surely” should be moved here.
- p.5. In the example on gambling the personal pronoun “his” should be replaced by “her”.
- p.9. In Eq. (1.14) on the right-hand side of the equation Q(y,π(x)) should be Q(y,π(y)).
- p.12. In footnote 1, add “if” before “it”.
- p. 21. line 1: “then” should be “than”.
- p. 22. The text “goal is to approximate the value function V underlying ” should be deleted.
- p. 23. Delete “be” from “is no longer be guaranteed”. After θ
^{(λ)}in the middle of page delete “.”. The phrase “using V_{θ}” should be “ using the chosen features φ”. - p. 25. “some methods using which” should be “some methods that avoid”
- p. 32. The word “complicate” (in the middle of page) should be “complicated”.
- p. 40. The text “Gittins (1989) has shown” should be “Gittins (1989) showed”.
- p. 43. The 4th displayed equation and the text surrounding it should be deleted. This is
the equation that says that R
_{T }^{UCRL2(δ)}= O(D^{2}||^{2}|| log(T∕δ)∕ε+εT). This equation holds (under the cited conditions), but it does not lead itself to a logarithmic regret bound. - p. 47. On line 4 of the 1st paragraph of Section 3.3., “optimalas” should be “optimal as”. On the same page, on line 3, after Eq. (3.1): “, Algorithm 12 the pseudocode of Q-learning.” should start with a full stop and is missing the word “shows”. So, the text should be “. Algorithm 12 shows the pseudocode of Q-learning.”.
- p. 48. Section 3.2 is mentioned twice in the same sentence (around the middle of the page). The second occurence should be deleted.
- p. 56, Algorithm 16, line 7. The correct update equation is b ← b + R
_{t+1}⋅ z [Tom Schaul, Idsia]. - p. 58. The definition of regret should be R
_{T }^{}= Tρ^{*}-_{ T }^{}[Hamid Reza Maei, Stanford]. - p. 65. The definition of norm is missing the so-called homogeneity condition: For any
λ ∈ ℝ, v ∈ V , f(λv) = |λ|f(v). One the same page, in the bottom, “ℓ
^{∞}norms” should be “ℓ^{∞}norm”. - p. 66. “uniformly bounded” should be “bounded” (when mentioning a single function).
- p. 67. “Polish mathematicians” should be singular: “Polish mathematician”.
- p. 68. On the top of page, “Assume that T is a γ-contraction.” should go into the next line.
- p. 69. In the line preceding the definition of B(), “uniformly bounded” should be “bounded”.

For further information, visit http://www.ualberta.ca/~szepesva/RLBook.html.