In a loose sense, every equilibrium of a multistage game embodies a certain idea of reputation. At every point of the game, each player anticipates what the opponents will do on the basis of their current reputation, as this has been shaped by previously observed behavior. Intuitively, this view seems particularly well suited to the case in which the game involves repeated interaction according to afixed stage game. Then, the range of available actions and entailed payoffs remain unchanged, a scenario that lends itself quite naturally to having players rely on past (observed) behavior when shaping their ensuing predictions.
In everyday life, we typically think of a (good) reputation as a certain asset, i.e., something valuable that, once acquired, is worth preserving. Of course, the value of any such reputation must depend on the time horizon during which one envisages to benefit from it. Thus, if gaining a good reputation is costly125 (otherwise, it would be essentially meaningless), any decision concerning its possible preservation must
125 Often, the kind of costs involved in building a reputation are opportunity costs, i.e., costs associated with letting some gains (say, short-run or opportunistic ones) slip away.
crucially depend on the remaining length of time during which it can still be used. In this section, we informally illustrate the multifaceted considerations involved in this respect through a variety of examples, all of them in the context of repeated games. The intuitive features just outlined already arise quite starkly in these examples. However, for a precise analysis of matters the reader is referred to Section 8.6 (in the Supplementary Material of this chapter), where these issues are studied in formal detail.
Let us start by considering again the infinitely repeated prisoner's dilemma. In this context, the simplest cooperative equilibrium is that which sustains (C, C) by the threat of responding drastically to any deviation with a constant adoption of action D throughout (that is, with the threat of turning irreversibly to playing the unique Nash equilibrium of the stage game). In a sense, we may conceive the support of cooperation in such an equilibrium as the outcome of a joint "reputation for good will" that is maintained over time through cooperative behavior. This reputation, however, is extremely fragile: a single failure to abide by it is enough for its irreversible collapse.
Let us now reconsider the finite repetition of the prisoner's dilemma. As explained, every Nash equilibrium of this game leads to the action profile (D, D) being played throughout, independently of how protracted the (finitely lived) interaction might be. This may be viewed as the reflection of an unsatisfactory "modeling discontinuity at infinity," i.e., what appears to hold for T = <x (the possibility of supporting cooperation) is nevertheless completely unfeasible at equilibrium for every given T e N. In general, such an acute discontinuity should be interpreted as a "warning" that the model might be imperfectly or incompletely specified. But, in the present context, that theoretical uneasiness is even reinforced by an empirical concern. In laboratory experiments, where real subjects (often students) have been made to play a repeated prisoner's dilemma under significant monetary rewards, long stretches of cooperation are typically observed when the number of periods involved is large. More specifically, a significant fraction of cooperative behavior is found in the early stages, although the backward-induction logic seems to take over in the final stages and lead to a steep rise of defection.126
The literature has pursued a variety of different approaches to tackle the theoretical and empirical issues raised by the above observations. Here, we focus on the incomplete-information route proposed by Kreps, Milgrom, Roberts, and Wilson (1982).127 These authors slightly perturb the game with a small amount of asymmetric information, allowing for some small probability that either of the two players involved in the finitely repeated prisoner's dilemma be of an "irrational" type. More specifically, they model the situation as a Bayesian game with Nature
126 There has been a long experimental literature concerned with the finitely repeated prisoner's dilemma. For the earlier part of it, Lave (1962) is a good representative, whereas interesting examples of more recent experimental research in this context can be found in Selten and Stocker (1986) or Andreoni and Miller (1993). These experiments are summarized in Subsection 12.7.3, where we also contrast at some length both their different focus and their alternative theoretical underpinnings.
127 An alternative approach based on the notion of e-rationality (or e-equilibrium) is described in Subsection 8.5.2.
(recall Section 6.2), where there is a low a priori probability that the (Harsanyi) type of each player holds preferences that render the following tit-for-tat (TFT) strategy dominant:
"At each t, choose C if the other player chose C in the preceding period; otherwise, choose D."
Under those circumstances, Kreps etal. show that, in every sequential equilibrium of the perturbed game with Nature (cf. Section 4.6), each rational-type player (i.e., one with the original preferences of the prisoner's dilemma) mimics the irrationaltype player during most of the game, provided the time horizon is long enough. That is, precisely because of her rationality, each player behaves most of the time as prescribed by TFT, under the prediction that the other player will also behave in this fashion. Indeed, such a prediction is always confirmed at equilibrium, even though the opponent is very likely to be rational and thus not hold TFT preferences.
A general result in this vein is stated and proven in Subsection 8.6.1 (cf. Theorem 8.11). Along the lines of our former discussion, one can interpret this result as reflecting an equilibrium process of "investment in reputation." Given that both players share a common belief that the opponent could possibly be a rare but "useful" type (e.g., a TFT type in the repeated prisoner's dilemma), both players prefer to behave as this type would, at least in the early phase of the game. In this way, the equilibrium reputation that they will continue to play in a "constructive" manner is preserved. Of course, keeping such a reputation will generally entail short-run opportunity costs. Thus, for a rational type to find it a worthwhile pursuit at equilibrium, the game must be long enough to allow for a sufficiently protracted enjoyment of the future payoff benefits.
Heuristically, the result just outlined displays an interesting, somewhat paradoxical, feature: players are interested in concealing their rationality. Or, in other words, they prefer not to carry its logical implications through, consciously clinging to any small doubt in this respect that the (incomplete-information) game may avail. This, in sum, allows even small subjective probabilities for a certain type of irrationality to entail important payoff consequences.
However, a potentially controversial issue then arises as to what "manifestations of irrationality" players could, or should, admit in their analysis of the game. In contrast with the fact that there are only limited ways of modeling rationality (i.e., they must all involve some suitable embodiment of payoff maximization and, perhaps, rational expectations - recall Sections 2.2 and 2.7), the scope for possible "irrationalities" seems vastly unrestricted. For example, in the model proposed by Kreps et al. (1982) for the repeated prisoner's dilemma, it was convenient to consider a particular kind of reciprocity-inducing irrationality (i.e., that reflected by the TFT strategy). But, of course, many other different such possibilities could have been contemplated instead. In general, one may suspect that, as different types of irrationality are being considered for a particular stage game, a wide range of equilibrium (and therefore payoff) possibilities could arise under repeated interaction. Indeed, this conjecture will be proven essentially true in Subsection 8.6.1, where it will lead to an incomplete-information counterpart of our previous folk theorems. Informally, that is, it will be shown that every individually rational payoff may be approximated at a sequential equilibrium of a suitably perturbed finitely repeated game.
As will be recalled (cf. Section 8.3), one of the issues that can be raised against folk-type results concerns the large equilibrium multiplicity they typically span. Such a multiplicity, however, cannot be tackled by perturbing the game with some incomplete information because, as suggested above, there is seldom an obvious way to choose the "suitable perturbation." This problem would seem exacerbated even further if, instead of just one irrationality, several of them are allowed simultaneously with positive probability. But, in this case, one would also expect that a conflict might arise among the players, who could become involved in a tour de force to settle what reputation should steer equilibrium play. Which player might be expected to end up succeeding in this struggle? Intuitively, it seems that the one who has more at stake should prevail. In particular, if players differ in their discount rates, it may be conjectured that the one who is more patient (i.e., places more weight on future payoffs) is bound to gain the upper hand in imposing her own preferred reputation.
To facilitate a precise discussion of these subtle issues, the literature has mostly focused on a very stylized theoretical framework. In it, a long-term player (i.e., one with an infinite time horizon) interacts with a sequence of short-run agents whose concerns span just one period (i.e., the only period where they interact with the long-run player). A paradigmatic example of this setup is provided by the so-called chain-store game, originally proposed by Selten (1978). We rely on this game to illustrate some of the main issues involved.
Consider a large chain store that operates in a given set of different (say, spatially separated) markets. In each of them, the chain faces the potential entry of a specific and independent competitor, which is circumscribed to that particular market. Every one of these potential entrants must take, in sequence, the decision of whether to actually enter in competition with the chain. More precisely, let t = 1, 2,..., T stand for the different dates at which these decisions must be adopted. Then, at each such t, the corresponding market-specific firm (which is supposed to be informed of all previous history) adopts one of two possible decisions: entry (E) or not entry (N). Having observed this choice, the chain store then responds in one of two different ways: it can either fight entry (F) or acquiesce (A).
To fix ideas, let the "profit potential" of each market be equal to 2, which can be either peacefully shared by the chain store and the corresponding firm (thus inducing a payoff of 1 for each) or simply enjoyed by the former. Suppose, on the other hand, that the "cost of fighting" is equal to —2 for both firms, which induces a net payoff of — 1 if they enter a fight. With these conventions, the extensive-form (two-stage) game that is played by the chain store and each of the potential entrants may be represented as in Figure 8.1.
Clearly, the only subgame-perfect equilibrium of the game represented in Figure 8.1 is given by the strategy profile (E, A). Now suppose that, as suggested above, this game is embedded into the larger context where the same chain store plays repeatedly and in sequence with afinite number T of potential entrants. Then, by resorting to a by now familiar backward-induction argument, it is straightforward
Figure 8.1: Chain-store stage game.
Figure 8.1: Chain-store stage game.
to check that the only subgame-perfect equilibrium of such a repeated game also involves playing (E, A) in every t = 1, 2,..., T.
Let us now focus on an extension of the previous context to the case in which the chain store faces in sequence an unbounded number of potential entrants, its overall (intertemporal) payoffs being identified with, say, the flow of stage payoffs discounted at a certain given rate 8 e (0, 1).128 Of course, in such an infinite-horizon game, there still is a subgame-perfect equilibrium where (E, A) is played every period. However, if 8 is large enough, there is now also an alternative subgame-perfect equilibrium where, on the equilibrium path, no potential entrant ever enters under the fear that, if it were to do so, the chain store would respond by fighting. In a sense, this fear is to be conceived as a reflection of the "fighting reputation" the chain store enjoys, at equilibrium, in the eyes of the potential entrants. And again, the long-term value of this reputation derives from its own fragility. It is only because this reputation would immediately collapse if the chain store ever tolerated entry that every potential entrant understands that entry would always be fought and is thus best avoided altogether.
Formally, the aforementioned considerations are embodied by the following equilibrium strategies: for each t = 1, 2,..., and every possible history ht-1 prevailing at t, the chain store (denoted by c) and the potential entrant (identified by e) respectively react as follows129:
An interesting feature of this equilibrium (only Nash? subgame-perfect as well? -cf. Exercise 8.13) is that, in contrast with the infinitely repeated prisoner's dilemma
128 Alternatively, one could considerthe possibility (also contemplated in Section 8.2 for ordinary repeated games) that the intertemporal preferences of the chain store are given by the limit average payoffs earned throughout the whole game.
129 The present example deviates from the theoretical framework introduced in Section 8.2 because not all players display the same time horizon. A reformulation of the original setup that accommodates for this feature is introduced in Subsection 8.6.2. Of course, an additional difference resides in the fact that, because the game played at every t involves two distinct stages, the chain store has to take no action when the potential entrant decides to keep out of the market (i.e., chooses N). To tackle this problem formally, one can simply resort to the notational convention that, in that case, al = 0.
Figure 8.2: Chain-store stage game, alternative version.
Figure 8.2: Chain-store stage game, alternative version.
discussed earlier, the chain-store reputation can be maintained at equilibrium without ever being put to any tangible test. That is, its fighting reputation would only have to be "honored" if a short-run firm enters, an event never observed at equilibrium. Despite this contrast, the repeated prisoner's dilemma and the chain-store game do display an analogous discontinuity in the length of the time horizon; i.e., both lead to drastically different (equilibrium) analysis when the horizon of the interaction passes from being finite to infinite.130 In the present case, this discontinuity (again to be judged counterintuitive and theoretically problematical) has been labeled the chain-store paradox.
Kreps and Wilson (19826) and Milgrom and Roberts (1982) addressed independently the "resolution" of this paradox along lines quite similar to those described above for the repeated prisoner's dilemma. Specifically, they postulated that, in the finite-horizon version of the game, there is a small a priori probability that the chain store could display payoffs different from those contemplated in Figure 8.1; for example, one may suppose that in this alternative case they are as indicated in Figure 8.2.
If payoffs were as described in Figure 8.2, the chain store would always fight the entry of any short-run firm and, therefore, it would be unambiguously optimal for every potential entrant to remain on the sidelines. Thus, suppose that, with some small subjective prior probability, short-run firms allow for the possibility that the chain-store's payoffs might be as in this second alternative version. Then, it can be shown that, if the time horizon (i.e., number of potential entrants) is large enough, the so-perturbed repeated game with Nature leads to no entry by the short-run firms in an arbitrarily high fraction of the initial periods. Or, to be more precise, this behavior occurs at every sequential equilibrium of the game, no matter how small their subjective probability on the second type. For the short-run firms, staying out is an optimal response to the credible threat on the part of the chain store that it will maintain its reputation (i.e., fight entry), at least in the early part of game. Through this reputation, the long-run firm obtains average payoffs that are arbitrarily close to those of monopoly (that is, close to 2) provided the game is long enough.
Even though this conclusion has obvious parallelisms with our above discussion of the finitely repeated prisoner's dilemma, it is worth stressing that there are
130 Recall that a finite repetition of the chain-store game induces entry by every potential entrant in its unique subgame-perfect equilibrium.
important differences as well. In the chain-store game, sustaining the equilibrium reputation is not jointly advantageous for all players. That is, this reputation benefits only the chain-store firm, which is prepared to uphold it if necessary by fighting any entry decision. In contrast, of course, any of the short-run firms would like that, at its time of play, such a reputation had collapsed somehow and therefore the chain store would be ready to share the market. However, those firms have no way (nor incentives) to struggle to that effect because their individual time horizon is too short.131 Thus, exploiting the entailed asymmetry, the chain store is able to maintain, at equilibrium, its most profitable reputation. As we formally show in Subsection 8.6.2 (cf. Theorem 8.12), this is a phenomenon that arises with some generality in contexts of repeated interaction where a long-run player coexists with a finite (but sufficiently long) series of short-run players.
Was this article helpful?