Chapter 7: Learning and Teaching

  • Page number: p.218 (e-book)
    • Section number: Figure 7.8 The minmax-Q algorithm.
    • Date: 5/2/2014
    • Name: Haden Lee
    • Email: haden[dot]lee[at]stanford[dot]edu
    • Content: A minor typo in the pseudo-code. "... \sum_{a'} (\Pi(s, a') * Q ..." should be "... \sum_{a'} (\Pi'(s, a') * Q ..." (just change \Pi to \Pi' as in the domain of argmax).
  • Page number: p.220 (e-book)
    • Section number: 7.5
    • Date: 4/30/2014
    • Name: Haden Lee
    • Email: haden[dot]lee[at]stanford[dot]edu
    • Content: A minor typo at the bottom of the page: "... and let \alpha^t(s_i) be ..." should be "\alpha^t(s)" to be consistent within the section (just remove the subscript "i").
  • Page number: p.221 (e-book)
    • Section number: 7.5, Definitions 7.5.1 and 7.5.2.
    • Date: 4/30/2014
    • Name: Haden Lee
    • Email: haden[dot]lee[at]stanford[dot]edu
    • Content: Definition 7.5.1 should be $R^t(s) = \alpha^t(s) - \alpha^t$ instead of $\alpha^t - \alpha^t(s)$ for two reasons: (A) The book mentions this definition's consistency with Def 3.4.5, but it seems the opposite is true when it is defined as in the book. (B) With the definition in the book, no-regret leraning rule (def 7.5.2) seems wrong because if there is some pure strategy s that is 'better' than what the agent has been playing, then Pr([lim inf R^t(s)] <= 0) is always equal to 1, but this is exaclty what we want to avoid. Instead, either Def 7.5.2 should be changed to Pr([lim inf R^t(s)] <= 0) = 0 OR Def 7.5.1 should be changed (just flip the sign). The latter change seems more reasonable.
      In addition, with this change, on page 221, a bullet point on Regret matching must also be changed. The weight for pure strategy s is defined as \sigma_i^{t+1}(s) = \frac{R^t(s)}{\sum_{s' \in S_i} R^t(s')}, but note that (with the new definition of R^t(s)), this weight can be negtaive. One natural fix is the following: \sigma_i^{t+1}(s) = \frac{ max(0, R^t(s)) }{\sum_{s' \in S_i} max(0, R^t(s')) }. That is, treat negative R^t(s) terms as zeros because if R^t(s) < 0 then it implies that what I have been playing is better than playing s, and thus I have no reason to place any weight on s. Instead, I should place weights on the strategies that gave me positive regrets.

The following errors are fixed in the second printing of the book and online PDF v1.1

  • Page number: 200
    • Section number:7.3
    • Date: Feb 27, 2010
    • Name: Kevin
    • Content: changed the notation for other player's strategy and set of strategies for consistency. The paragraph now reads: As in fictitious play, each player begins the game with some prior beliefs. After each round, the player uses Bayesian updating to update these beliefs. Let $S_{-i}^i$ be the set of the opponent's strategies considered possible by player $i$, and $H$ be the set of possible histories of the game. Then we can use Bayes' rule to express the probability assigned by player $i$ to the event in which the opponent is playing a particular strategy $s_{-i}\in S_{-i}^i$ given the observation of history $h\in H$, as \[P_i (s_{-i} | h) = \frac{P_i (h | s_{-i}) P_i (s_{-i})}{\sum_{s_{-i}' \in S_{-i}^i} P_i (h | s_{-i}') P_i (s_{-i}') }.\]
  • Page number: 204
    • Section number: 7.4
    • Date: Feb 27, 2010
    • Name: Kevin
    • Content: Added footnote: "For consistency with the literature on reinforcement learning, in this section we use the notation $s$ and $S$ for a state and set of states respectively, rather than for a strategy profile and set of strategy profiles as elsewhere in the book."
  • Page number: 211
    • Section number: 7.6
    • Date: Feb 27, 2010
    • Name: Kevin
    • Content: Renamed the set of target opponent strategies from $S$ to $\tilde{S}$ for consistency with the rest of the book, in which $S$ denotes the set of all strategy profiles.
  • Page number: 213-218
    • Section number: 7.7
    • Date:Feb 27, 2010
    • Name: Nicolas Lambert
    • Content: All instances of S should be replaced by s for consistency with the rest of the book.
  • Page number: 215
    • Section number:7.7.1
    • Date:10.28.09
    • Name:Yoav
    • Email:
    • Content:Defs 7.7.3 and 7.7.4 (stable steady state and asymp stable state) are missing "for sufficiently small $\epsilon$..."
-- KevinLeytonBrown - 13 Nov 2008
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2014-05-02 - HadenLee
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback