Johannes Treutlein and Rubi Hudson worked on this post as part of SERI MATS, under the mentorship of Evan Hubinger. Rubi has also received mentorship from Leo Gao. We thank Erik Jenner for helpful discussions and Alexander Pan for bringing the performative prediction literature to our attention.

Update 30 May 2023: We have now published a paper based on our previous post and this post (the material from this post is in Appendix D).

1. Introduction

In our previous post, we analyzed a setting in which an oracle AI is maximizing a strictly proper scoring rule while it can influence the world with its predictions about the probabilities of possible outcomes. In this setting, a prediction is called a fixed point if it accurately reflects the oracle’s beliefs about the world, after having made that prediction. We showed that an oracle AI that is maximizing a strictly proper scoring rule may choose to misrepresent its beliefs and not output a fixed point, even if one exists. This is bad since, all else equal, we expect better outcomes when decisions are based on accurate rather than inaccurate reports.

In our previous analysis, we assumed that the oracle jointly optimizes both its prediction and the effect of its prediction on the world. In contrast, in this post we examine cases in which the AI only optimizes its prediction—there is a stop-gradient (Foerster et al., 2018; Demski, 2019) in front of the oracle’s model of the world, which prevents optimizing outcomes directly.

This type of cognition could result, for instance, from self-supervised training on historical data. In that case, the AI may learn a robust world model, and it may learn to match its predictions to its beliefs about the world. However, the AI may never learn to optimize outcomes through its prediction, since this yields no advantage on the training data, which cannot be influenced by the AI.

Consider a situation where this oracle makes a prediction that can influence the world, and assume it models this influence correctly. Then this could put the AI in a game in which it is trying to make a prediction to match its world model, while the world model updates its beliefs conditional on the AI’s prediction. What happens in this game depends on the cognition learned by the AI during training. However, we show that, in contrast to the incentives discussed in our previous post, the only equilibria of this game would be fixed points. Such an oracle could thus be safer if there exists a unique safe fixed point, or if randomly chosen fixed points are likely safe.

In this post, we review and analyze several settings related to stop-gradients with the property that equilibria in them are fixed points or close to fixed points. Our goal is (i) to gain a better understanding of stop-gradient optimization in oracles and (ii) to highlight some potential training setups and agent cognitions that would lead to fixed points.

Similar ideas to the ones outlined above have been discussed in the literature on performative prediction (Perdomo et al. 2020). Performative prediction is a more general framework in machine learning in which the choice of model has an influence on the modeled distribution. Our analysis of oracle AIs is closely related and can be seen as a special case of this setting. We will indicate similarities where appropriate and make use of concepts from that literature.

First, we introduce performative stability and relate it to the game outlined above. We show that performatively stable predictions and equilibria in the game are fixed points. Second, we introduce versions of the repeated risk minimization and repeated gradient descent algorithms. We show that both lead to fixed point predictions. Third, we discuss practical methods based on repeated gradient descent that can be applied to an online learning setting, including Armstrong (2018)’s backwards-facing oracle.

Finally, we discuss no-regret learning and prediction markets. We introduce a no-regret learning framework and show that policies with sublinear regret also have sublinear prediction error. In our decision market model, we show that, if the weight of each trader is small, the equilibria of the market are close to fixed points.

Oracles with stop-gradients may optimize outcomes to find fixed points, which could lead to bad consequences. Thus, the safest oracles would be ones that make predictions only about aspects of the world they cannot influence. However, among oracles that can influence the world, ones with a stop-gradient are preferable for two reasons. First, they report their true beliefs, which gives us better information to base decisions on. This also enables approaches in which we ensure that there is only one safe fixed point. Second, the AI does not optimize its choice of fixed point, which is safer for the standard reasons of not wanting to optimize for an unaligned goal. Which fixed point is chosen will be contingent on initialization and specifics of the fixed point finding procedure.

2. Formal setting

Our setup is analogous to the one in our previous post. We refer the reader to that post for more context. We will generalize our setting to predictions over more than two possible outcomes in an upcoming paper, but in this post we focus on binary predictions for simplicity. We believe that the results in this post generalize to higher dimensions.

An oracle reports a probability p∈[0,1]for a binary event. A scoring rule is a function S:[0,1]×{0,1}→¯¯¯¯R, where ¯¯¯¯R:=[−∞,∞] is the extended real line. Given prediction p∈[0,1] and outcome i∈{0,1}, the oracle receives the score S(p,i). We write

S(p,q):=qS(p,1)+(1−q)S(p,0)

for the expected score of predicting p, given that i is Bernoulli-distributed with parameter q. A scoring rule S is called proper if

S(q,q)≥S(p,q)

for all p,q∈[0,1]. It is called strictly proper if this inequality is strict whenever p≠q.

Theorem 1 (Gneiting and Raftery). Let S be a scoring rule. Then S is strictly proper if and only if there exists a strictly convex function G:[0,1]→¯¯¯¯R with subderivativesg:[0,1]→¯¯¯¯R (if G is differentiable, this is just the derivative, g(p)=G′(p)) such that

S(p,q)=G(p)+g(p)(q−p)

for any p,q∈[0,1].

To model the oracle’s perceived influence on the world, we assume that there is a function f:[0,1]→[0,1] describing how the model’s beliefs about the world, q, depend on its prediction, p. That is, we assume that q=f(p). We say that p is a fixed point if f(p)=p, i.e., if the oracle’s belief is p, after updating on making the prediction p.

3. Performative stability and game theory

Our discussion of oracles is related to performative prediction (Perdomo et al. 2020). This is a machine learning setting where we choose a model parameter (e.g., parameters for a neural network) that minimizes expected loss (e.g., classification error). In performative prediction, the distribution over data points can depend on the choice of model parameter. Our setting is thus a special case in which the parameter of interest is a probability distribution, the loss is a scoring function, and data points are discrete outcomes. Most results in this and our previous post have analogues in performative prediction.

Translated into our setting, we say that a prediction p∗ is performatively optimal if

p∗=argmaxpS(p,f(p)).

This corresponds to the objective considered in our previous post, of an oracle that maximizes score. By results in that post, performatively optimal predictions are not necessarily fixed points.

Now consider an oracle that chooses predictions p optimally given beliefs f(p), but without optimizing f(p) explicitly. This idea is captured by the definition of performative stability.

Definition 2 (Performative stability (Perdomo et al. 2020)). A prediction p∗ is called performatively stable if

p∗=argmaxpS(p,f(p∗)).

Note that f(p∗) is fixed when taking the maximum—there is a “stop-gradient” before f(p∗) in this objective. It follows that performatively stable points are fixed points.

Proposition 3. Assume S is strictly proper. Then a prediction p∗ is a fixed point if and only if it is performatively stable.

Proof. “⇒”. Assume f(p∗)=p∗. Then, since S is proper,

S(p∗,f(p∗))=S(p∗,p∗)≥S(p,p∗)=S(p,f(p∗))

for any p∈[0,1]. Hence, p∗=argmaxpS(p,f(p∗)).

“⇐”. Assume p∗=argmaxpS(p,f(p∗)). Since S is strictly proper, it follows p∗=f(p∗). ◻

Having introduced performative stability and related it to fixed points, we will now discuss the game from the introduction.

Definition 4 (Oracle game). Consider a two-player continuous game in which the first player controls p∈[0,1] and the second player controls q∈[0,1], with utility functions U1(p,q):=S(p,q) and U2(p,q):=−12(f(p)−q)2 for the two players, respectively.

If p∗,q∗ is a Nash equilibrium of the oracle game, we have p∗=argmaxpS(p,q) and q∗=argmaxq−(f(p∗)−q)2. Substituting the optimum value q∗=f(p∗) for the second player gives us exactly above definition of performative stability. Conversely, if a prediction p∗ is performatively stable, then setting q∗:=f(p∗) yields a Nash equilibrium.

Proposition 5. Assume S is a proper scoring rule. Then for p∈[0,1] and q:=f(p), (p,q) is a Nash equilibrium of the oracle game, if and only if p is performatively stable. By Proposition 3, this is equivalent to p being a fixed point.

The oracle game could be played between two submodules of an oracle, one who is responsible for making predictions and one who is responsible for updating the oracle’s beliefs. Those two modules might collapse into just one big module taking a question and outputting a prediction, but it is still useful as a starting point to model them separately. Note that using the squared error for the world model is not essential here. We could also use any other function that is minimized at q=f(p).

The game could also arise in an agent that uses causal decision theory to maximize its score and that believes that S is influenced causally by p, but only acausally by f(p). In that case, the only ratifiable (see Jeffrey, 1983, Ch. 1.7) decision is a Nash equilibrium of the above game. Similarly, the deliberational causal epistemic decision theory discussed by Greaves (2013) would output Nash equilibria of this game (whereas performative optimality would correspond to an agent using evidential epistemic decision theory in this case).

Perdomo et al. (2020) introduce a stackelberg version of the oracle game that produces performatively optimal instead of performatively stable reports. Consider a game in which player 1 acts first and chooses p, after which player 2 adjusts its prediction q. Then player 2 will choose q=f(p), so player 1’s optimization problem becomes

4. Repeated risk minimization and repeated gradient descent

Above, we have defined an alternative optimization problem and an associated game which yield fixed points, but we have not defined methods for solving these problems. In the performative prediction context, Perdomo et al. (2020) introduce repeated risk minimization and repeated gradient descent, both methods that converge to performatively stable points. In this section, we review both schemes and show how repeated gradient descent can be seen as gradient descent on a stop-gradient objective.

Here, we assume direct access to q, instead of having only access to samples distributed according to q. In the next section, we discuss online learning when we only have access to samples. One way to understand this distinction is that the former corresponds to the internal cognition of an agent with a belief q=f(p), optimizing a prediction p. The latter corresponds to a machine learning training setup for an oracle, where q is the ground truth distribution instead of the oracle’s belief. Of course, there is no strict divide between the two. Any optimization algorithm could be used either by the agent itself or to train the agent.

First, repeated risk minimization is a procedure by which we start with a prediction p1 and then iteratively update the prediction as pt+1=argmaxpS(p,f(pt)). Note that this is equivalent to alternating best response learning in the oracle game, where players 1 and 2 alternatingly optimize their actions given the action by the other player. Then player 2‘s update is qt=f(pt), and if S is strictly proper, player 1’s update is pt+1=qt=f(pt). This shows that repeated risk minimization results in fixed point iteration on f. Fixed point iteration converges globally to a fixed point if f has Lipschitz constant L<1. It also converges locally to a fixed point p∗ if f is continuously differentiable at p∗ and |f′(p∗)|<1.

Second, assume that S is differentiable. Then repeated gradient ascent updates predictions via

pt+1:=Π(pt+αEy∼f(pt)[∇pS(pt,y)]),

where Ey∼f(pt)[⋅] denotes the expectation over outcome y with Bernoulli distribution with parameter f(pt), Π is the projection onto [0,1], and α>0 is the learning rate.

where, ⊥ is the stop-gradient operator. It evaluates to the identity function but sets gradients to zero, ∇x(⊥x)=0 (Foerster et al., 2018; Demski, 2019). This is not a mathematical function (there is no function that is equal to the identity but has gradient zero everywhere), but rather a notational convention in reference to the stop_gradient or detach functions from tensorflow or pytorch. Interestingly, one can perform valid derivations using the stop-gradient operator (e.g., using the chain rule). We leave it to future work to explore the mathematics behind stop-gradients further.

Importantly, it matters that the gradient in repeated gradient ascent lies inside instead of outside the expectation:

Unlike repeated gradient ascent, the latter implements gradient ascent on S(pt,f(pt)) and thus leads to performatively optimal reports.

Perdomo et al. (2020) show that, given their assumptions, repeated gradient descent globally converges to stable fixed points, and they provide convergence rates. We will show an analogous result relating repeated gradient ascent to fixed points in our setting, though we won’t analyze global convergence.

To begin, we show that repeated gradient ascent is equivalent to naive learning (Letcher et al., 2019) in the oracle game, assuming that player 2 always plays q=f(p).

Proposition 6. Assume player 1 is performing gradient ascent on its objective with learning rate α, under the assumption that player 2 plays q=f(p). Then player 1’s update is

pt+1=Π(pt+α∇p(S(pt,⊥f(pt)))).

Proof. The proof follows immediately from the definitions. Player 1’s update is, by assumption,

pt+1=Π(pt+α∇p(U1(pt,q)))=Π(pt+α∇p(S(pt,q))),

where q is player 2’s action. Assuming player 2 plays q=f(pt), we get

pt+1=Π(pt+α∇p(S(pt,q)))=Π(pt+α∇p(S(pt,⊥f(pt)))).

◻

Next, we show that fixed points are critical points of the fixed-point objective.

Proposition 7. Assume S is proper and let G,g as in the Gneiting and Raftery characterization (Theorem 1) be differentiable. Then for any p∈[0,1], we have

∇p(S(p,⊥f(p)))=g′(p)(f(p)−p).

In particular, if p is a fixed point, it follows that ∇p(S(p,⊥f(p)))=0. The reverse is true if g′(p)≠0.

Finally, we show that in our setting, repeated gradient ascent locally converges to fixed points p∗, with linear convergence rate, assuming that f′(p∗)<1 and g′(p∗)>0.

Theorem 8. Let S be a strictly proper scoring rule. Let p∗∈[0,1] be a fixed point of f such that G is twice continuously differentiable at p∗ and g′(p∗)>0. Moreover, assume f is continuously differentiable at p∗ and f′(p∗)<1. Then, for small enough α>0, there exists an open set U⊂R with p∗∈U such that for p1∈U∩[0,1], an agent taking updates pt+1=Π(pt+α∇p(S(pt,⊥f(pt)))) will linearly converge to p∗.

Proof. In Appendix A. This is a standard convergence proof. Consider the discrete dynamical system defined by the agent’s updates pt+1=φ(pt). By Proposition 7, it has a fixed point at p=p∗. We show that ∇p(∇p(S(p∗,⊥f(p∗))))<0. This means that the fixed point is stable and thus the system will locally converge to it given small enough α. ◻

5. Online learning

Now consider a machine learning setup in which we train an oracle with stochastic gradient ascent on environment samples. We assume that at time t, a model makes a prediction Pt and receives a score S(Pt,Yt), where Yt is Bernoulli-distributed with parameter f(Pt). The model is then updated using gradient ascent on S(Pt,Yt). That is, for some learning rate schedule (αt)t, we have

Pt+1=Π(Pt+αt∇pS(Pt,Yt)).

We discuss this as a theoretical model for oracles trained using machine learning, to show how training setups may incentivize predicting fixed points. There are many issues with the setting beyond giving accurate predictions; for instance, even if the training process sets the right incentives on training examples, the learned model may be optimizing a different objective when generalizing to new predictions.

To see that above setup incentivizes predicting fixed points, note that

That is, the expectation of this gradient, conditional on Pt, is exactly the repeated gradient from the previous section. Hence, given the right assumptions, this converges to fixed points instead of performative optima. We do not show this here, but an analogous result in performative prediction was proved by Mendler-Dünner et al. (2020).

There are several variations of this setup that essentially set the same incentives. For instance, one could also draw entire batches of outcomes Yt,1:B and then perform updates based on the batch gradient ∇p∑Bb=1S(Pt,Yt,b). This is a monte carlo estimate of the repeated gradient and thus also converges to performatively stable points and thus fixed points (Perdomo et al., 2020). One could also mix the two algorithms and, e.g., perform gradient ascent on an average of past losses, yielding a version of the backwards-facing oracle discussed in Armstrong (2018). In Appendix D, we show that that oracle can only converge to fixed points.

Note that finding fixed points depends on the fact that we differentiate S(Pt,Yt) instead of the expectation EYt∼f(Pt)[S(Pt,Yt)]=S(Pt,f(Pt)). If we used policy gradients to differentiate S(Pt,f(Pt)), for instance, we would again optimize for performative optimality. Similarly, we could learn a Q-function representing scores for each prediction, and update the function based on randomly sampled predictions p. Then the Q-function would converge to estimates of S(p,f(p)), and the highest Q-value would be a performative optimum. There are also some more recent papers in performative prediction that explicitly try to estimate the gradient ∇p(S(p,f(p))) and thus find performatively optimal instead of stable points (Izzo et al., 2021).

Stop-gradients could also be circumvented in a hidden way (Krueger et al., 2020). For instance, consider a hyperparameter search to meta-learn a learning algorithm, where the evaluation criterion is the accumulated loss during an episode. Then this search would prefer algorithms that optimize S(p,f(p)) directly, without a stop-gradient.

Lastly, repeated gradient descent is related to decoupled approval in RL (Uesato et al., 2020). The decoupled approval policy gradient samples actions and approval queries independently and can thus differentiate with a stop-gradient in front of the approval signal. In our setting, we can differentiate through S(Pt,Yt) directly, so it is not necessary to calculate this gradient with a decoupled policy gradient. Decoupled gradients could be used to implement the stop-gradient objective if scores were discrete or otherwise not differentiable.

6. No-regret learning

In this section, we consider no-regret learning and show that algorithms have sublinear regret if and only if their prediction error is sublinear. Regret takes the environment probabilities as given and asks which predictions would have been optimal in hindsight. It thus puts a “stop-gradient” in front of those environment probabilities.

As in the previous section, we assume that at time t∈N, the agent makes a prediction Pt and receives a score S(Pt,Yt), where Yt∼f(Pt). The agent’s cumulative score at step T is defined as ∑Tt=1S(Pt,Yt). In no-regret learning, we compare performance against experts, which choose sequences of probabilities (P′t)t,P′t∈[0,1]. We assume that an expert’s prediction P′t is independent of Yt conditional on Pt. I.e., an expert knows the predictions Pt and thus probabilities f(Pt), but it does not know the outcome of Yt. Let P the set of all such experts.

The regret of the agent is the difference between the cumulative score received by the best expert in expectation and the cumulative score received by the agent. To define it formally, let

for t∈N. P∗t is a random variable that maximizes the expectation of S(P∗t,Yt) before Yt is drawn, but conditional on Pt.

Definition 9 (Regret). The regret of agent (Pt)t at time T is

Regret(T)=T∑t=1S(P∗t,Yt)−S(Pt,Yt).

The agent is said to have sublinear regret or no-regret if

limsupT→∞1TRegret(T)≤0.

First, note that we define regret relative to the best expert in expectation instead of the best expert in hindsight. The latter would always be the one that made confident predictions and accidentally got all predictions exactly right. To achieve sublinear regret, it would thus be too much to ask the agent to perform well compared to the best expert in hindsight. Moreover, for scoring rules with S(0,0)=S(1,1), this expert would have a constant score C, such that Regret(T)=∑Tt=1C−S(Pt,Yt). This would reduce the problem to minimizing the negative score and thus finding performatively optimal predictions.

Second, we evaluate the performance of the expert with respect to the environment outcomes Yt generated by the agent Pt, instead of evaluating the expert according to outcomes ~Yt∼f(P∗t) generated using their own predictions. This means that, to receive sublinear regret, the agent only has to make accurate predictions—it does not have to find a performatively optimal prediction. Our setting is thus different from the no-regret learning setup discussed in Jagadeesan et al. (2022), where regret is defined with respect to S(P∗t,f(P∗t)). In that setting, only agents converging to performatively optimal predictions have sublinear regret.

We begin by showing that the best expert in expectation actually exists, and that P∗t=f(Pt).

Proposition 10. Let S be a proper scoring rule and (P′t)t∈P an expert. Then for any t∈N, we have

E[S(P′t,Yt)]=E[S(P′t,f(Pt))].

Moreover, we have (P∗t)t=(f(Pt))t and thus

Regret(T)=T∑t=1S(f(Pt),Yt)−S(Pt,Yt).

Proof. Let t∈N and let (P′t)t∈P be any expert. Conditional on Pt, Yt has parameter f(Pt) and is independent of P′t by assumption. Hence,

Moreover, (f(Pt))t∈P, as f(Pt) is constant given Pt and thus independent of Yt.

It follows that, for any t∈N,P∗t∈argmax(P′t)t∈PE[S(P′t,Yt)], and thus

Regret(T)=T∑t=1S(f(Pt),Yt)−S(Pt,Yt).

◻

If S is unbounded (such as the log scoring rule), then the agent’s scores can become arbitrarily low, and the limit of 1TRegret(T) may be undefined. To simplify our analysis, we will thus assume that there is a bound on the variance of the received score S(P′t,Yt) and on the expected score S(P′t,f(Pt)) of both the agent (Pt)t and the best expert (P∗t)t. In the case of the log scoring rule, this would be satisfied, for instance, if the agent’s predictions are bound away from 0 and 1.

Our next proposition shows that, given these assumptions, the limit limT→∞1TRegret(T) exists and is nonnegative, and sublinear regret is equivalent to limt→∞1TRegret(T)=0.

Proposition 11. Let S be a proper scoring rule. Assume that supt|S(P′t,f(Pt))|<∞ and that suptVar(S(P′t,Yt))<∞ for P′t∈{Pt,f(Pt)}. Then almost surely

In particular, almost surely both limits exist and are finite, and the agent has sublinear regret if and only if

limT→∞1TT∑t=1S(f(Pt),f(Pt))−S(Pt,f(Pt))=0.

Proof. In Appendix B. ◻

Now we turn to the main result for this section. We show that given our assumptions, agents have sublinear regret if and only if their prediction error is sublinear. Note that here, we do not require the Pt to converge; they could also oscillate between different fixed points.

Theorem 12. Let (Pt)t be the sequence of the agent’s predictions and S a strictly proper scoring rule. Assume that suptVar(S(P′t,Yt))<∞ for P′t∈{Pt,f(Pt)}, and assume that there exists a compact set C⊆[0,1] such that Pt∈C for all t and S(p,f(p)), S(f(p),f(p)), and f(p) are continuous in p at any p∈C. Then almost surely the agent has sublinear regret if and only if ∑Tt=1|f(Pt)−Pt| is sublinear, i.e., if limt→∞1T∑Tt=1|f(Pt)−Pt|=0.

Proof. In Appendix C. ◻

The next result shows that if the agent’s probabilities converge to some probability p, then p must be a fixed point.

Corollary 13. In addition to the assumptions from Theorem 12, assume that Pt converges almost surely to a limit limt→∞Pt=p∗. Then almost surely p∗ is a fixed point if and only if the agent has sublinear regret.

Proof. By Theorem 12, almost surely the agent has sublinear regret if and only if

limt→∞1TT∑t=1|f(Pt)−Pt|=0.

It remains to show that, given that the Pt converge, the latter is equivalent to convergence to a fixed point.

Since C is compact and Pt∈C for all t∈N, also p∗∈C. Hence, f is continuous at p∗, so

This shows that, almost surely, p∗ is a fixed point, if and only if ∑Tt=1|f(Pt)−Pt| is sublinear. ◻

7. Prediction markets

Lastly, we consider prediction markets. We assume a simplified model of a prediction market, in which traders submit a single prediction and get scored using a proper scoring rule. The prediction that is output by the market and that influences the outcome is just a weighted average of the individual traders’ predictions. In this situation, if a trader has a small weight and can thus barely influence the market prediction, the trader’s score will mostly be determined by the accuracy of the report, rather than the influence of the report on the market. Thus, if all traders are small relative to the market, the equilibrium prediction will be close to a fixed point.

A similar result was shown by Hardt et al. (2022) in the performative prediction context. They define a firm’s performative power as the degree to which the firm can influence the overall outcome with their prediction. Hardt et al. (2022) show that in an equilibrium, the distance between a player’s (performatively optimal) equilibrium strategy and their strategy when optimizing loss against the fixed equilibrium distribution (here, this means predicting the market probability) is bounded by the power of the trader. We give an analogous result for our formal setting and assumptions.

To formalize the setting, assume that there are n players. We associate with each player i a number wi∈[0,1] such that ∑jwj=1, representing, intuitively, what fraction of the overall capital in the market is provided by player i. In the game, all players simultaneously submit a probability pi. Then the event Y occurs with probability q=f(∑jwjpj). Finally, each player is scored in proportion to S(pi,Y) for some strictly proper scoring rule S. Typical market scoring rules would consider terms like S(pi,Y)−S(pj,Y), but subtracting S(pj,Y) (or multiplying by constants) does not matter for the game. We assume that players maximize their expected score, E[S(pi,Y)]=S(pi,f(∑jwjpj)).

We assume that f is common knowledge. Moreover, in the following we only consider pure strategy equilibria, and we do not investigate the existence of equilibria.

Theorem 14. Let S be a proper scoring rule and let G,g as in the Gneiting and Raftery characterization. Let p be a pure strategy Nash equilibrium of the game defined above and let ^p:=∑jwjpj be the market prediction. Assume f is differentiable at ^p. For any player i, if G,g are differentiable at pi and g′(pi)≠0, it follows that

|f(^p)−pi|≤∣∣∣wif′(^p)g(pi)g′(pi)∣∣∣.

Whenever pi∉{0,1}, the bound is tight.

In particular, this theorem shows that players i with very low wi (little capital/influence on q) will accurately predict f(^p)=f(∑jwjpj). Note, however, that ^p is not necessarily a fixed point or close to a fixed point. If there are are also players i with very high wi, then their prediction and the overall market prediction may be wrong. (So interestingly the overall market probability ∑jwjpj is worse than the prediction of individuals. One might take this to suggest that anyone interested in q should look at the latter type of predictions. Of course, if this is what everyone does, it is not so clear anymore that the model q=f(∑jwjpj) is accurate.)

Proof. The proof is analogous to the proof of Proposition 9 in our previous post.

For p to be a pure strategy Nash equilibrium, each player must play a best response to the other player’s strategies. That is, pi must be a global maximum of the function pi↦S(pi,∑jwjpj). Therefore, the derivative of this function must be zero if pi∈(0,1) and positive/negative if pi=1/pi=0.

Rearranging terms and taking the absolute value, it follows

|f(^p)−pi|=∣∣∣wif′(^p)g(pi)g′(pi)∣∣∣.

Next, assume pi=0 is the optimal report for player i. Then we must have

0≤ddpS(p,f(p))=g′(pi)(f(^p)−pi)+g(pi)wif′(^p).

By Theorem 1, G is convex and thus g′(p)≥0. Hence, the above is equivalent to pi−f(^p)≥wif′(^p)g(pi)/g′(pi). Since pi=0, it follows pi−f(^p)≤0. Hence, |pi−f(^p)|≤∣∣∣wif′(^p)g(pi)g′(pi)∣∣∣.

Finally, if p=1, then analogously pi−f(^p)≤wif′(^p)g(pi)/g′(pi), and since p=1, it follows again |pi−f(^p)|≤∣∣∣wif′(^p)g(pi)g′(pi)∣∣∣. This concludes the proof. ◻

Corollary 15. In addition to the assumptions from Theorem 14, assume that f is Lipschitz-continuous and that C:=supp∈[0,1]∣∣g(p)g′(p)∣∣<∞. Let p be a Nash equilibrium and let ϵ>0 arbitrary. Then there exists a δ>0 such that if for all i, wi<δ, all of pi and f(pi), for all i, as well as ∑jwjpj and f(∑jwjpj), are within ϵ of each other.

Proof. Let ϵ>0 arbitrary. Let L be the Lipschitz constant of f and note that then |f′(p)|≤L for all p∈[0,1]. By Theorem 14, it follows for ^p:=∑jwjpj and any player i that

|f(^p)−pi|≤wiLC.

Now let λ:=min({1,1L}) and δ:=ϵλ4CL. Then, assuming wi<δ for all i, it follows

|f(^p)−pi|≤δLC≤λ4ϵ.

Moreover, since ^p is a convex combination of probabilities pi, it follows that also

|f(^p)−^p|≤maxi|f(^p)−pi|≤λ4ϵ.

Thus, by the triangle equality, we have |pi−^p|≤2λ4ϵ, and since f is Lipschitz-continuous,

|f(^p)−f(pi)|≤L|^p−pi|≤L2λ4ϵ≤12ϵ

for any i.

This shows that all of pi,^p,f(pi) are within 12ϵ of f(^p) and thus by the triangle inequality within ϵ of each other. This concludes the proof. ◻

It would be interesting to extend these results. For example, it’s already not so clear if the players make predictions repeatedly. (To keep things simple, one should probably still imagine that all players know f and that the environment probability is determined by f applied to the majority forecast. If the traders have private information, prediction markets become harder to analyze. For some discussions, see Ostrovsky 2012, Chen and Waggoner 2016.)

Appendix A. Proof of Theorem 8

We use the following theorem, adapted from “Iterative Solution of Nonlinear Equations in Several Variables” (Ortega and Rheinboldt, 2000).

Theorem 16 (Ostrowski). Assume that φ:D⊆Rn→Rn has a fixed point x∗∈int(D) and is continuously differentiable at x∗. If |μ|<1 for all eigenvalues μ of φ′(x∗), then there exists an open set U⊆D with x∗∈U such that for any x0∈U, letting xk=φ(xk−1) for k∈N, it is xk∈U for all k and xk converges at least linearly to x∗.

To begin, assume that p∗∈[0,1] is a fixed point of f, that f is continuously differentiable at p∗, and that f′(p∗)<1. Moreover, let G,g as in the Gneiting and Raftery representation of S, and assume that G is twice continuously differentiable at p∗ and thus G′′=g′ is continuous at p∗.

Define the function

φ:[0,1]→R,p↦p+α∇p(S(p,⊥f(p))).

Note that the agent’s update rule is then ~φ defined via ~φ(p):=min{1,max{0,φ(p)}}.

First, by Proposition 7, we know that for any p∗∈[0,1], we have ∇pS(p∗,⊥f(p∗))=g′(p∗)(f(p∗)−p∗), and if p∗ is a fixed point of f, we must have g′(p∗)(f(p∗)−p∗)=0 and hence φ(p∗)=p∗. Hence, p∗ is a fixed point of φ and thus also of ~φ.

Now we will apply Theorem 16 to φ. To that end, let

Hence, for small enough α>0, it is |φ′(p∗)|<1. Moreover, by assumption, g′(p∗) and f′(p∗) are continuous, so also the derivative φ′(p∗)=1+αg′(p∗)(f′(p∗)−1) is continuous.

Now we can apply Ostrowski’s theorem, which tells us that, if p∗∈(0,1), there is an open set U⊆(0,1) with p∗∈U, such that for any p0∈U, the iterates pk=φ(pk−1) all lie in U and pk converges at least linearly to p∗.

This shows that if p∗ is an interior point, we can choose α>0 small enough to make sure that F(U)⊆U⊆(0,1) and thus ~φ=φ on U, and for p0∈U, the iteration pk=~φ(pk−1) locally converges to p∗.

Now assume that p∗=1 (the proof for p∗=0 is analogous). We can extend the function φ for values p∈(1,∞) by the Whitney extension theorem. This theorem tells us that if φ is continuously differentiable, then there also exists ¯¯¯¯φ:(0,∞)→R such that φ=¯¯¯¯φ|[0,1]. We can then apply the above argument with Ostrowski’s theorem to the function ¯¯¯¯φ. In particular, there exists an open subset U⊆(0,∞) with p∗∈U such that for any p0∈U, we have pk∈U for iterates pk=¯¯¯¯φ(pk−1) and limk→∞pk=p∗. This also applies to any p0∈U∩[0,1].

Now consider the actual update rule ~φ with iterates ~pk=~φ(~pk−1) and ~p0=p0∈(0,1]. Let k:=inf{k∣pk∈(1,∞)}; i.e., this is the smallest k such that an iterate is out of bounds and thus pk≠~pk. If k=∞, then we are done. Otherwise, we have ~pk=~φ(pk−1)=1=p∗, so the actual update rule ~φ also converges to its fixed point p∗ (potentially faster than ¯¯¯¯φ). This concludes the proof.

Appendix B. Proof of Proposition 11

We will use a version of the strong law of large numbers for uncorrelated random variables with bounded variance, adapted from Neely (2021, Theorem 2).

Theorem 17. Let {Xt}t∈N0 be a sequence of pairwise uncorrelated random variables with mean 0 and bounded variances. I.e., assume that

E[Xt]=0 for all t∈N0

There exists c>0 such that Var(Xt)≤c for all t∈N0

Cov(Xt,Xt′)=0 for all t≠t′∈N0.

Then almost surely

limT→∞1TT∑t=1Xt=0.

We will apply this law to random variables Xt:=S(P′t,Yt)−S(P′t,f(Pt)), where P′t is either Pt or f(Pt).

First, by Proposition 10, E[Xt]=E[S(P′t,Yt)−S(P′t,f(Pt))]=0. Second, by assumption,

## Stop-gradients lead to fixed point predictions

Johannes Treutlein and Rubi Hudson worked on this post as part ofSERI MATS, under the mentorship of Evan Hubinger. Rubi has also received mentorship from Leo Gao. We thank Erik Jenner for helpful discussions and Alexander Pan for bringing the performative prediction literature to our attention.Update 30 May 2023: We have now publisheda paperbased onour previous postand this post (the material from this post is in Appendix D).## 1. Introduction

In our previous post, we analyzed a setting in which an oracle AI is maximizing a strictly proper scoring rule while it can influence the world with its predictions about the probabilities of possible outcomes. In this setting, a prediction is called a

fixed pointif it accurately reflects the oracle’s beliefs about the world, after having made that prediction. We showed that an oracle AI that is maximizing a strictly proper scoring rule may choose to misrepresent its beliefs and not output a fixed point, even if one exists. This is bad since, all else equal, we expect better outcomes when decisions are based on accurate rather than inaccurate reports.In our previous analysis, we assumed that the oracle jointly optimizes both its prediction and the effect of its prediction on the world. In contrast, in this post we examine cases in which the AI only optimizes its prediction—there is a

stop-gradient(Foerster et al., 2018; Demski, 2019) in front of the oracle’s model of the world, which prevents optimizing outcomes directly.This type of cognition could result, for instance, from self-supervised training on historical data. In that case, the AI may learn a robust world model, and it may learn to match its predictions to its beliefs about the world. However, the AI may never learn to optimize outcomes through its prediction, since this yields no advantage on the training data, which cannot be influenced by the AI.

Consider a situation where this oracle makes a prediction that can influence the world, and assume it models this influence correctly. Then this could put the AI in a game in which it is trying to make a prediction to match its world model, while the world model updates its beliefs conditional on the AI’s prediction. What happens in this game depends on the cognition learned by the AI during training. However, we show that, in contrast to the incentives discussed in our previous post, the only equilibria of this game would be fixed points. Such an oracle could thus be safer if there exists a unique safe fixed point, or if randomly chosen fixed points are likely safe.

In this post, we review and analyze several settings related to stop-gradients with the property that equilibria in them are fixed points or close to fixed points. Our goal is (i) to gain a better understanding of stop-gradient optimization in oracles and (ii) to highlight some potential training setups and agent cognitions that would lead to fixed points.

Similar ideas to the ones outlined above have been discussed in the literature on

performative prediction(Perdomo et al. 2020). Performative prediction is a more general framework in machine learning in which the choice of model has an influence on the modeled distribution. Our analysis of oracle AIs is closely related and can be seen as a special case of this setting. We will indicate similarities where appropriate and make use of concepts from that literature.First, we introduce

performative stabilityand relate it to the game outlined above. We show that performatively stable predictions and equilibria in the game are fixed points. Second, we introduce versions of therepeated risk minimizationandrepeated gradient descentalgorithms. We show that both lead to fixed point predictions. Third, we discuss practical methods based on repeated gradient descent that can be applied to an online learning setting, including Armstrong (2018)’s backwards-facing oracle.Finally, we discuss no-regret learning and prediction markets. We introduce a no-regret learning framework and show that policies with sublinear regret also have sublinear prediction error. In our decision market model, we show that, if the weight of each trader is small, the equilibria of the market are close to fixed points.

Oracles with stop-gradients may optimize outcomes to find fixed points, which could lead to bad consequences. Thus, the safest oracles would be ones that make predictions only about aspects of the world they cannot influence. However, among oracles that can influence the world, ones with a stop-gradient are preferable for two reasons. First, they report their true beliefs, which gives us better information to base decisions on. This also enables approaches in which we ensure that there is only one safe fixed point. Second, the AI does not optimize its choice of fixed point, which is safer for the standard reasons of not wanting to optimize for an unaligned goal. Which fixed point is chosen will be contingent on initialization and specifics of the fixed point finding procedure.

## 2. Formal setting

Our setup is analogous to the one in our previous post. We refer the reader to that post for more context. We will generalize our setting to predictions over more than two possible outcomes in an upcoming paper, but in this post we focus on binary predictions for simplicity. We believe that the results in this post generalize to higher dimensions.

An oracle reports a probability p∈[0,1]for a binary event. A

S(p,q):=qS(p,1)+(1−q)S(p,0)scoring ruleis a function S:[0,1]×{0,1}→¯¯¯¯R, where ¯¯¯¯R:=[−∞,∞] is the extended real line. Given prediction p∈[0,1] and outcome i∈{0,1}, the oracle receives the score S(p,i). We writefor the expected score of predicting p, given that i is Bernoulli-distributed with parameter q. A scoring rule S is called

S(q,q)≥S(p,q)properiffor all p,q∈[0,1]. It is called

strictly properif this inequality is strict whenever p≠q.As in the previous post, we will use a result from Gneiting and Raftery (2007, Theorem 1).

g:[0,1]→¯¯¯¯R

S(p,q)=G(p)+g(p)(q−p)Theorem 1(Gneiting and Raftery).LetSbe a scoring rule. ThenSis strictly proper if and only if there exists a strictly convex functionG:[0,1]→¯¯¯¯Rwithsubderivatives(ifGis differentiable, this is just the derivative,g(p)=G′(p)) such thatfor anyp,q∈[0,1].To model the oracle’s perceived influence on the world, we assume that there is a function f:[0,1]→[0,1] describing how the model’s beliefs about the world, q, depend on its prediction, p. That is, we assume that q=f(p). We say that p is a

fixed pointif f(p)=p, i.e., if the oracle’s belief is p, after updating on making the prediction p.## 3. Performative stability and game theory

Our discussion of oracles is related to

performative prediction(Perdomo et al. 2020). This is a machine learning setting where we choose a model parameter (e.g., parameters for a neural network) that minimizes expected loss (e.g., classification error). In performative prediction, the distribution over data points can depend on the choice of model parameter. Our setting is thus a special case in which the parameter of interest is a probability distribution, the loss is a scoring function, and data points are discrete outcomes. Most results in this and our previous post have analogues in performative prediction.Translated into our setting, we say that a prediction p∗ is

p∗=argmaxpS(p,f(p)).performatively optimalifThis corresponds to the objective considered in our previous post, of an oracle that maximizes score. By results in that post, performatively optimal predictions are not necessarily fixed points.

Now consider an oracle that chooses predictions p optimally given beliefs f(p), but without optimizing f(p) explicitly. This idea is captured by the definition of

performative stability.

p∗=argmaxpS(p,f(p∗)).Definition 2(Performative stability (Perdomo et al. 2020)). A prediction p∗ is called performatively stable ifNote that f(p∗) is fixed when taking the maximum—there is a “stop-gradient” before f(p∗) in this objective. It follows that performatively stable points are fixed points.

Proposition 3.AssumeSis strictly proper. Then a predictionp∗is a fixed point if and only if it is performatively stable.

S(p∗,f(p∗))=S(p∗,p∗)≥S(p,p∗)=S(p,f(p∗))Proof.“⇒”. Assume f(p∗)=p∗. Then, since S is proper,for any p∈[0,1]. Hence, p∗=argmaxpS(p,f(p∗)).

“⇐”. Assume p∗=argmaxpS(p,f(p∗)). Since S is strictly proper, it follows p∗=f(p∗). ◻

Having introduced performative stability and related it to fixed points, we will now discuss the game from the introduction.

Definition 4(Oracle game). Consider a two-player continuous game in which the first player controls p∈[0,1] and the second player controls q∈[0,1], with utility functions U1(p,q):=S(p,q) and U2(p,q):=−12(f(p)−q)2 for the two players, respectively.If p∗,q∗ is a Nash equilibrium of the oracle game, we have p∗=argmaxpS(p,q) and q∗=argmaxq−(f(p∗)−q)2. Substituting the optimum value q∗=f(p∗) for the second player gives us exactly above definition of performative stability. Conversely, if a prediction p∗ is performatively stable, then setting q∗:=f(p∗) yields a Nash equilibrium.

Proposition 5.AssumeSis a proper scoring rule. Then forp∈[0,1]andq:=f(p),(p,q)is a Nash equilibrium of the oracle game, if and only ifpis performatively stable. By Proposition 3, this is equivalent topbeing a fixed point.The oracle game could be played between two submodules of an oracle, one who is responsible for making predictions and one who is responsible for updating the oracle’s beliefs. Those two modules might collapse into just one big module taking a question and outputting a prediction, but it is still useful as a starting point to model them separately. Note that using the squared error for the world model is not essential here. We could also use any other function that is minimized at q=f(p).

The game could also arise in an agent that uses causal decision theory to maximize its score and that believes that S is influenced causally by p, but only acausally by f(p). In that case, the only

ratifiable(see Jeffrey, 1983, Ch. 1.7) decision is a Nash equilibrium of the above game. Similarly, the deliberational causal epistemic decision theory discussed by Greaves (2013) would output Nash equilibria of this game (whereas performative optimality would correspond to an agent using evidential epistemic decision theory in this case).Perdomo et al. (2020) introduce a stackelberg version of the oracle game that produces performatively optimal instead of performatively stable reports. Consider a game in which player 1 acts first and chooses p, after which player 2 adjusts its prediction q. Then player 2 will choose q=f(p), so player 1’s optimization problem becomes

p∗=argmaxpS(p,argmaxq−(q−f(p))2)=argmaxpS(p,f(p)).## 4. Repeated risk minimization and repeated gradient descent

Above, we have defined an alternative optimization problem and an associated game which yield fixed points, but we have not defined methods for solving these problems. In the performative prediction context, Perdomo et al. (2020) introduce

repeated risk minimizationandrepeated gradient descent, both methods that converge to performatively stable points. In this section, we review both schemes and show how repeated gradient descent can be seen as gradient descent on a stop-gradient objective.Here, we assume direct access to q, instead of having only access to samples distributed according to q. In the next section, we discuss online learning when we only have access to samples. One way to understand this distinction is that the former corresponds to the internal cognition of an agent with a belief q=f(p), optimizing a prediction p. The latter corresponds to a machine learning training setup for an oracle, where q is the ground truth distribution instead of the oracle’s belief. Of course, there is no strict divide between the two. Any optimization algorithm could be used either by the agent itself or to train the agent.

First,

repeated risk minimizationis a procedure by which we start with a prediction p1 and then iteratively update the prediction as pt+1=argmaxpS(p,f(pt)). Note that this is equivalent to alternating best response learning in the oracle game, where players 1 and 2 alternatingly optimize their actions given the action by the other player. Then player 2‘s update is qt=f(pt), and if S is strictly proper, player 1’s update is pt+1=qt=f(pt). This shows that repeated risk minimization results infixed point iterationon f. Fixed point iteration converges globally to a fixed point if f has Lipschitz constant L<1. It also converges locally to a fixed point p∗ if f is continuously differentiable at p∗ and |f′(p∗)|<1.Second, assume that S is differentiable. Then

pt+1:=Π(pt+αEy∼f(pt)[∇pS(pt,y)]),repeated gradient ascentupdates predictions viawhere Ey∼f(pt)[⋅] denotes the expectation over outcome y with Bernoulli distribution with parameter f(pt), Π is the projection onto [0,1], and α>0 is the learning rate.

Using the definition of S, we have

Ey∼f(pt)[∇pS(pt,y)]=∇p(Ey∼q[S(pt,y)])|q=f(pt)=∇p(S(pt,q))|q=f(pt).We can express this as

∇p(S(pt,⊥f(pt))):=∇p(S(pt,q))|q=f(pt),where, ⊥ is the

stop-gradient operator. It evaluates to the identity function but sets gradients to zero, ∇x(⊥x)=0 (Foerster et al., 2018; Demski, 2019). This is not a mathematical function (there is no function that is equal to the identity but has gradient zero everywhere), but rather a notational convention in reference to the`stop_gradient`

or`detach`

functions from tensorflow or pytorch. Interestingly, one can perform valid derivations using the stop-gradient operator (e.g., using the chain rule). We leave it to future work to explore the mathematics behind stop-gradients further.Importantly, it matters that the gradient in repeated gradient ascent lies inside instead of outside the expectation:

Ey∼f(pt)[∇pS(pt,y)]=∇p(S(pt,⊥f(pt)))≠∇p(S(pt,f(pt)))=∇p(Ey∼f(pt)[S(pt,y)]).Unlike repeated gradient ascent, the latter implements gradient ascent on S(pt,f(pt)) and thus leads to performatively optimal reports.

Perdomo et al. (2020) show that, given their assumptions, repeated gradient descent globally converges to stable fixed points, and they provide convergence rates. We will show an analogous result relating repeated gradient ascent to fixed points in our setting, though we won’t analyze global convergence.

To begin, we show that repeated gradient ascent is equivalent to naive learning (Letcher et al., 2019) in the oracle game, assuming that player 2 always plays q=f(p).

pt+1=Π(pt+α∇p(S(pt,⊥f(pt)))).Proposition 6.Assume player1is performing gradient ascent on its objective with learning rateα, under the assumption that player2playsq=f(p). Then player1’s update is

pt+1=Π(pt+α∇p(U1(pt,q)))=Π(pt+α∇p(S(pt,q))),Proof.The proof follows immediately from the definitions. Player 1’s update is, by assumption,where q is player 2’s action. Assuming player 2 plays q=f(pt), we get

pt+1=Π(pt+α∇p(S(pt,q)))=Π(pt+α∇p(S(pt,⊥f(pt)))).◻

Next, we show that fixed points are critical points of the fixed-point objective.

∇p(S(p,⊥f(p)))=g′(p)(f(p)−p).Proposition 7.AssumeSis proper and letG,gas in the Gneiting and Raftery characterization (Theorem 1) be differentiable. Then for anyp∈[0,1],we haveIn particular, ifpis a fixed point, it follows that∇p(S(p,⊥f(p)))=0. The reverse is true ifg′(p)≠0.

∇p(S(p,⊥f(p)))=∇p(S(p,q))|q=f(p)=∇p(G(p)+g(p)(q−p))|q=f(p)=(g(p)+g′(p)(q−p)−g(p))|q=f(p)=g′(p)(f(p)−p).Proof.◻

Finally, we show that in our setting, repeated gradient ascent locally converges to fixed points p∗, with linear convergence rate, assuming that f′(p∗)<1 and g′(p∗)>0.

Theorem 8.LetSbe a strictly proper scoring rule. Letp∗∈[0,1]be a fixed point offsuch thatGis twice continuously differentiable atp∗andg′(p∗)>0. Moreover, assumefis continuously differentiable atp∗andf′(p∗)<1. Then, for small enoughα>0, there exists an open setU⊂Rwithp∗∈Usuch that forp1∈U∩[0,1], an agent taking updatespt+1=Π(pt+α∇p(S(pt,⊥f(pt))))will linearly converge top∗.Proof.In Appendix A. This is a standard convergence proof. Consider the discrete dynamical system defined by the agent’s updates pt+1=φ(pt). By Proposition 7, it has a fixed point at p=p∗. We show that ∇p(∇p(S(p∗,⊥f(p∗))))<0. This means that the fixed point is stable and thus the system will locally converge to it given small enough α. ◻## 5. Online learning

Now consider a machine learning setup in which we train an oracle with stochastic gradient ascent on environment samples. We assume that at time t, a model makes a prediction Pt and receives a score S(Pt,Yt), where Yt is Bernoulli-distributed with parameter f(Pt). The model is then updated using gradient ascent on S(Pt,Yt). That is, for some learning rate schedule (αt)t, we have

Pt+1=Π(Pt+αt∇pS(Pt,Yt)).We discuss this as a theoretical model for oracles trained using machine learning, to show how training setups may incentivize predicting fixed points. There are many issues with the setting beyond giving accurate predictions; for instance, even if the training process sets the right incentives on training examples, the learned model may be optimizing a different objective when generalizing to new predictions.

To see that above setup incentivizes predicting fixed points, note that

EYt∼f(Pt)[∇pS(Pt,Yt)]=∇pEYt∼⊥f(Pt)[S(Pt,Yt)]=∇p(S(Pt,⊥f(Pt))).That is, the expectation of this gradient, conditional on Pt, is exactly the repeated gradient from the previous section. Hence, given the right assumptions, this converges to fixed points instead of performative optima. We do not show this here, but an analogous result in performative prediction was proved by Mendler-Dünner et al. (2020).

There are several variations of this setup that essentially set the same incentives. For instance, one could also draw entire batches of outcomes Yt,1:B and then perform updates based on the batch gradient ∇p∑Bb=1S(Pt,Yt,b). This is a monte carlo estimate of the repeated gradient and thus also converges to performatively stable points and thus fixed points (Perdomo et al., 2020). One could also mix the two algorithms and, e.g., perform gradient ascent on an average of past losses, yielding a version of the backwards-facing oracle discussed in Armstrong (2018). In Appendix D, we show that that oracle can only converge to fixed points.

Note that finding fixed points depends on the fact that we differentiate S(Pt,Yt) instead of the expectation EYt∼f(Pt)[S(Pt,Yt)]=S(Pt,f(Pt)). If we used policy gradients to differentiate S(Pt,f(Pt)), for instance, we would again optimize for performative optimality. Similarly, we could learn a Q-function representing scores for each prediction, and update the function based on randomly sampled predictions p. Then the Q-function would converge to estimates of S(p,f(p)), and the highest Q-value would be a performative optimum. There are also some more recent papers in performative prediction that explicitly try to estimate the gradient ∇p(S(p,f(p))) and thus find performatively optimal instead of stable points (Izzo et al., 2021).

Stop-gradients could also be circumvented in a hidden way (Krueger et al., 2020). For instance, consider a hyperparameter search to meta-learn a learning algorithm, where the evaluation criterion is the accumulated loss during an episode. Then this search would prefer algorithms that optimize S(p,f(p)) directly, without a stop-gradient.

Lastly, repeated gradient descent is related to

decoupled approvalin RL (Uesato et al., 2020). The decoupled approval policy gradient samples actions and approval queries independently and can thus differentiate with a stop-gradient in front of the approval signal. In our setting, we can differentiate through S(Pt,Yt) directly, so it is not necessary to calculate this gradient with a decoupled policy gradient. Decoupled gradients could be used to implement the stop-gradient objective if scores were discrete or otherwise not differentiable.## 6. No-regret learning

In this section, we consider no-regret learning and show that algorithms have sublinear regret if and only if their prediction error is sublinear. Regret takes the environment probabilities as given and asks which predictions would have been optimal in hindsight. It thus puts a “stop-gradient” in front of those environment probabilities.

As in the previous section, we assume that at time t∈N, the agent makes a prediction Pt and receives a score S(Pt,Yt), where Yt∼f(Pt). The agent’s cumulative score at step T is defined as ∑Tt=1S(Pt,Yt). In no-regret learning, we compare performance against

experts, which choose sequences of probabilities (P′t)t, P′t∈[0,1]. We assume that an expert’s prediction P′t is independent of Yt conditional on Pt. I.e., an expert knows the predictions Pt and thus probabilities f(Pt), but it does not know the outcome of Yt. Let P the set of all such experts.The regret of the agent is the difference between the cumulative score received by the best expert in expectation and the cumulative score received by the agent. To define it formally, let

P∗t∈argmaxP′t∈PE[S(P′t,Yt)∣Pt]=f(Pt)S(P′t,1)+(1−f(Pt))S(P′t,0)for t∈N. P∗t is a random variable that maximizes the expectation of S(P∗t,Yt) before Yt is drawn, but conditional on Pt.

Regret(T)=T∑t=1S(P∗t,Yt)−S(Pt,Yt).Definition 9(Regret). The regret of agent (Pt)t at time T isThe agent is said to have

limsupT→∞1TRegret(T)≤0.sublinear regretorno-regretifFirst, note that we define regret relative to the best expert in expectation instead of the best expert in hindsight. The latter would always be the one that made confident predictions and accidentally got all predictions exactly right. To achieve sublinear regret, it would thus be too much to ask the agent to perform well compared to the best expert in hindsight. Moreover, for scoring rules with S(0,0)=S(1,1), this expert would have a constant score C, such that Regret(T)=∑Tt=1C−S(Pt,Yt). This would reduce the problem to minimizing the negative score and thus finding performatively optimal predictions.

Second, we evaluate the performance of the expert with respect to the environment outcomes Yt generated by the agent Pt, instead of evaluating the expert according to outcomes ~Yt∼f(P∗t) generated using their own predictions. This means that, to receive sublinear regret, the agent only has to make accurate predictions—it does not have to find a performatively optimal prediction. Our setting is thus different from the no-regret learning setup discussed in Jagadeesan et al. (2022), where regret is defined with respect to S(P∗t,f(P∗t)). In that setting, only agents converging to performatively optimal predictions have sublinear regret.

We begin by showing that the best expert in expectation actually exists, and that P∗t=f(Pt).

E[S(P′t,Yt)]=E[S(P′t,f(Pt))].Proposition 10.LetSbe a proper scoring rule and(P′t)t∈Pan expert. Then for anyt∈N,we have

Regret(T)=T∑t=1S(f(Pt),Yt)−S(Pt,Yt).Moreover, we have(P∗t)t=(f(Pt))tand thus

E[S(P′t,Yt)]=E[E[S(P′t,Yt)∣Pt,P′t]]=E[S(P′t,f(Pt))].Proof.Let t∈N and let (P′t)t∈P be any expert. Conditional on Pt, Yt has parameter f(Pt) and is independent of P′t by assumption. Hence,Next, since S is proper,

E[S(P′t,f(Pt))]≤E[S(f(Pt),f(Pt))].It follows that

max(P′t)t∈PE[S(P′t,Yt)]=max(P′t)t∈PE[S(P′t,f(Pt))]≤E[S(f(Pt),f(Pt))]=E[S(f(Pt),Yt)].Moreover, (f(Pt))t∈P, as f(Pt) is constant given Pt and thus independent of Yt.

It follows that, for any t∈N, P∗t∈argmax(P′t)t∈PE[S(P′t,Yt)], and thus

Regret(T)=T∑t=1S(f(Pt),Yt)−S(Pt,Yt).◻

If S is unbounded (such as the log scoring rule), then the agent’s scores can become arbitrarily low, and the limit of 1TRegret(T) may be undefined. To simplify our analysis, we will thus assume that there is a bound on the variance of the received score S(P′t,Yt) and on the expected score S(P′t,f(Pt)) of both the agent (Pt)t and the best expert (P∗t)t. In the case of the log scoring rule, this would be satisfied, for instance, if the agent’s predictions are bound away from 0 and 1.

Our next proposition shows that, given these assumptions, the limit limT→∞1TRegret(T) exists and is nonnegative, and sublinear regret is equivalent to limt→∞1TRegret(T)=0.

limT→∞1TRegret(T)=limT→∞1TT∑t=1S(f(Pt),f(Pt))−S(Pt,f(Pt))≥0.Proposition 11.LetSbe a proper scoring rule. Assume thatsupt|S(P′t,f(Pt))|<∞and thatsuptVar(S(P′t,Yt))<∞forP′t∈{Pt,f(Pt)}. Then almost surely

limT→∞1TT∑t=1S(f(Pt),f(Pt))−S(Pt,f(Pt))=0.In particular, almost surely both limits exist and are finite, and the agent has sublinear regret if and only ifProof.In Appendix B. ◻Now we turn to the main result for this section. We show that given our assumptions, agents have sublinear regret if and only if their prediction error is sublinear. Note that here, we do

notrequire the Pt to converge; they could also oscillate between different fixed points.Theorem 12.Let(Pt)tbe the sequence of the agent’s predictions andSa strictly proper scoring rule. Assume thatsuptVar(S(P′t,Yt))<∞forP′t∈{Pt,f(Pt)}, and assume that there exists a compact setC⊆[0,1]such thatPt∈Cfor alltandS(p,f(p)),S(f(p),f(p)),andf(p)are continuous inpat anyp∈C. Then almost surely the agent has sublinear regret if and only if∑Tt=1|f(Pt)−Pt|is sublinear, i.e., iflimt→∞1T∑Tt=1|f(Pt)−Pt|=0.Proof.In Appendix C. ◻The next result shows that if the agent’s probabilities converge to some probability p, then p must be a fixed point.

Corollary 13.In addition to the assumptions from Theorem 12, assume thatPtconverges almost surely to a limitlimt→∞Pt=p∗. Then almost surelyp∗is a fixed point if and only if the agent has sublinear regret.

limt→∞1TT∑t=1|f(Pt)−Pt|=0.Proof.By Theorem 12, almost surely the agent has sublinear regret if and only ifIt remains to show that, given that the Pt converge, the latter is equivalent to convergence to a fixed point.

Since C is compact and Pt∈C for all t∈N, also p∗∈C. Hence, f is continuous at p∗, so

|f(p∗)−p∗|=∣∣∣f(limt→∞Pt)−limt→∞Pt∣∣∣=limt→∞|f(Pt)−Pt|.Since this sequence converges, it is equal to its Cesàro mean,

limt→∞|f(Pt)−Pt|=limT→∞1TT∑t=1|f(Pt)−Pt|.Hence,

|f(p∗)−p∗|=limt→∞|f(Pt)−Pt|=limT→∞1TT∑t=1|f(Pt)−Pt|.It follows that, if limt→∞Pt=p∗, then

|f(p∗)−p∗|=0⇔limT→∞1TT∑t=1|f(Pt)−Pt|=0.This shows that, almost surely, p∗ is a fixed point, if and only if ∑Tt=1|f(Pt)−Pt| is sublinear. ◻

## 7. Prediction markets

Lastly, we consider prediction markets. We assume a simplified model of a prediction market, in which traders submit a single prediction and get scored using a proper scoring rule. The prediction that is output by the market and that influences the outcome is just a weighted average of the individual traders’ predictions. In this situation, if a trader has a small weight and can thus barely influence the market prediction, the trader’s score will mostly be determined by the accuracy of the report, rather than the influence of the report on the market. Thus, if all traders are small relative to the market, the equilibrium prediction will be close to a fixed point.

A similar result was shown by Hardt et al. (2022) in the performative prediction context. They define a firm’s performative power as the degree to which the firm can influence the overall outcome with their prediction. Hardt et al. (2022) show that in an equilibrium, the distance between a player’s (performatively optimal) equilibrium strategy and their strategy when optimizing loss against the fixed equilibrium distribution (here, this means predicting the market probability) is bounded by the power of the trader. We give an analogous result for our formal setting and assumptions.

To formalize the setting, assume that there are n players. We associate with each player i a number wi∈[0,1] such that ∑jwj=1, representing, intuitively, what fraction of the overall capital in the market is provided by player i. In the game, all players simultaneously submit a probability pi. Then the event Y occurs with probability q=f(∑jwjpj). Finally, each player is scored in proportion to S(pi,Y) for some strictly proper scoring rule S. Typical market scoring rules would consider terms like S(pi,Y)−S(pj,Y), but subtracting S(pj,Y) (or multiplying by constants) does not matter for the game. We assume that players maximize their expected score, E[S(pi,Y)]=S(pi,f(∑jwjpj)).

For discussions of market scoring rules, see Hanson 2003 and Sami and Pennock 2007. Prior work has connected these market scoring rules to more realistic prediction markets that trade Arrow—Debreu securities markets such as PredictIt (Hanson 2003; Hanson 2007; Pennock and Sami 2007, Ch. 4; Chen and Pennock 2007; Agrawal et al. 2009; Chen and Vaughan 2010).

We assume that f is common knowledge. Moreover, in the following we only consider pure strategy equilibria, and we do not investigate the existence of equilibria.

|f(^p)−pi|≤∣∣∣wif′(^p)g(pi)g′(pi)∣∣∣.Theorem 14.LetSbe a proper scoring rule and letG,gas in the Gneiting and Raftery characterization. Letpbe a pure strategy Nash equilibrium of the game defined above and let^p:=∑jwjpjbe the market prediction. Assumefis differentiable at^p. For any playeri, ifG,gare differentiable atpiandg′(pi)≠0,it follows thatWheneverpi∉{0,1}, the bound is tight.In particular, this theorem shows that players i with very low wi (little capital/influence on q) will accurately predict f(^p)=f(∑jwjpj). Note, however, that ^p is not necessarily a fixed point or close to a fixed point. If there are are also players i with very high wi, then their prediction and the overall market prediction may be wrong. (So interestingly the overall market probability ∑jwjpj is worse than the prediction of individuals. One might take this to suggest that anyone interested in q should look at the latter type of predictions. Of course, if this is what everyone does, it is not so clear anymore that the model q=f(∑jwjpj) is accurate.)

Proof.The proof is analogous to the proof of Proposition 9 in our previous post.For p to be a pure strategy Nash equilibrium, each player must play a best response to the other player’s strategies. That is, pi must be a global maximum of the function pi↦S(pi,∑jwjpj). Therefore, the derivative of this function must be zero if pi∈(0,1) and positive/negative if pi=1/pi=0.

The derivative is

ddpiS(pi,f(∑jwjpj))=ddpi(g(pi)(f(∑jwjpj)−pi)+G(pi))=ddpi(g(pi)(f(∑jwjpj)−pi))+g(pi)=g′(pi)(f(^p)−pi)+g(pi)wif′(^p).Now, if p∈(0,1), we have

0=ddpS(p,f(p))=g′(pi)(f(^p)−pi)+g(pi)wif′(^p).Rearranging terms and taking the absolute value, it follows

|f(^p)−pi|=∣∣∣wif′(^p)g(pi)g′(pi)∣∣∣.Next, assume pi=0 is the optimal report for player i. Then we must have

0≤ddpS(p,f(p))=g′(pi)(f(^p)−pi)+g(pi)wif′(^p).By Theorem 1, G is convex and thus g′(p)≥0. Hence, the above is equivalent to pi−f(^p)≥wif′(^p)g(pi)/g′(pi). Since pi=0, it follows pi−f(^p)≤0. Hence, |pi−f(^p)|≤∣∣∣wif′(^p)g(pi)g′(pi)∣∣∣.

Finally, if p=1, then analogously pi−f(^p)≤wif′(^p)g(pi)/g′(pi), and since p=1, it follows again |pi−f(^p)|≤∣∣∣wif′(^p)g(pi)g′(pi)∣∣∣. This concludes the proof. ◻

Corollary 15.In addition to the assumptions from Theorem 14, assume thatfis Lipschitz-continuous and thatC:=supp∈[0,1]∣∣g(p)g′(p)∣∣<∞.Letpbe a Nash equilibrium and letϵ>0arbitrary. Then there exists aδ>0such that if for alli,wi<δ,all ofpiandf(pi), for alli, as well as∑jwjpjandf(∑jwjpj),are withinϵof each other.

|f(^p)−pi|≤wiLC.Proof.Let ϵ>0 arbitrary. Let L be the Lipschitz constant of f and note that then |f′(p)|≤L for all p∈[0,1]. By Theorem 14, it follows for ^p:=∑jwjpj and any player i thatNow let λ:=min({1,1L}) and δ:=ϵλ4CL. Then, assuming wi<δ for all i, it follows

|f(^p)−pi|≤δLC≤λ4ϵ.Moreover, since ^p is a convex combination of probabilities pi, it follows that also

|f(^p)−^p|≤maxi|f(^p)−pi|≤λ4ϵ.Thus, by the triangle equality, we have |pi−^p|≤2λ4ϵ, and since f is Lipschitz-continuous,

|f(^p)−f(pi)|≤L|^p−pi|≤L2λ4ϵ≤12ϵfor any i.

This shows that all of pi,^p,f(pi) are within 12ϵ of f(^p) and thus by the triangle inequality within ϵ of each other. This concludes the proof. ◻

It would be interesting to extend these results. For example, it’s already not so clear if the players make predictions

repeatedly. (To keep things simple, one should probably still imagine that all players know f and that the environment probability is determined by f applied to the majority forecast. If the traders have private information, prediction markets become harder to analyze. For some discussions, see Ostrovsky 2012, Chen and Waggoner 2016.)## Appendix A. Proof of Theorem 8

We use the following theorem, adapted from “Iterative Solution of Nonlinear Equations in Several Variables” (Ortega and Rheinboldt, 2000).

Theorem 16(Ostrowski).Assume thatφ:D⊆Rn→Rnhas a fixed pointx∗∈int(D)and is continuously differentiable atx∗. If|μ|<1for all eigenvaluesμofφ′(x∗), then there exists an open setU⊆Dwithx∗∈Usuch that for anyx0∈U, lettingxk=φ(xk−1)fork∈N,it isxk∈Ufor allkandxkconverges at least linearly tox∗.To begin, assume that p∗∈[0,1] is a fixed point of f, that f is continuously differentiable at p∗, and that f′(p∗)<1. Moreover, let G,g as in the Gneiting and Raftery representation of S, and assume that G is twice continuously differentiable at p∗ and thus G′′=g′ is continuous at p∗.

Define the function

φ:[0,1]→R,p↦p+α∇p(S(p,⊥f(p))).Note that the agent’s update rule is then ~φ defined via ~φ(p):=min{1,max{0,φ(p)}}.

First, by Proposition 7, we know that for any p∗∈[0,1], we have ∇pS(p∗,⊥f(p∗))=g′(p∗)(f(p∗)−p∗), and if p∗ is a fixed point of f, we must have g′(p∗)(f(p∗)−p∗)=0 and hence φ(p∗)=p∗. Hence, p∗ is a fixed point of φ and thus also of ~φ.

Now we will apply Theorem 16 to φ. To that end, let

ϵ:=∇p(∇pS(p∗,⊥f(p∗))).Then

ϵ=∇p(g′(p∗)(f(p∗)−p∗))=g′′(p∗)(f(p∗)−p∗)+g′(p∗)(f′(p∗)−1)=g′(p∗)(f′(p∗)−1)since f(p∗)=p∗. Moreover, by assumption, g′(p∗)>0 and f′(p∗)<1. Hence, it follows that ϵ=g′(p∗)(f′(p∗)−1)<0.

Now note that

φ′(p∗)=∇p(p+α∇p(S(p∗,⊥f(p∗))))=1+α∇p(∇p(S(p∗,⊥f(p∗))))=1+αϵ.Hence, for small enough α>0, it is |φ′(p∗)|<1. Moreover, by assumption, g′(p∗) and f′(p∗) are continuous, so also the derivative φ′(p∗)=1+αg′(p∗)(f′(p∗)−1) is continuous.

Now we can apply Ostrowski’s theorem, which tells us that, if p∗∈(0,1), there is an open set U⊆(0,1) with p∗∈U, such that for any p0∈U, the iterates pk=φ(pk−1) all lie in U and pk converges at least linearly to p∗.

This shows that if p∗ is an interior point, we can choose α>0 small enough to make sure that F(U)⊆U⊆(0,1) and thus ~φ=φ on U, and for p0∈U, the iteration pk=~φ(pk−1) locally converges to p∗.

Now assume that p∗=1 (the proof for p∗=0 is analogous). We can extend the function φ for values p∈(1,∞) by the Whitney extension theorem. This theorem tells us that if φ is continuously differentiable, then there also exists ¯¯¯¯φ:(0,∞)→R such that φ=¯¯¯¯φ|[0,1]. We can then apply the above argument with Ostrowski’s theorem to the function ¯¯¯¯φ. In particular, there exists an open subset U⊆(0,∞) with p∗∈U such that for any p0∈U, we have pk∈U for iterates pk=¯¯¯¯φ(pk−1) and limk→∞pk=p∗. This also applies to any p0∈U∩[0,1].

Now consider the actual update rule ~φ with iterates ~pk=~φ(~pk−1) and ~p0=p0∈(0,1]. Let k:=inf{k∣pk∈(1,∞)}; i.e., this is the smallest k such that an iterate is out of bounds and thus pk≠~pk. If k=∞, then we are done. Otherwise, we have ~pk=~φ(pk−1)=1=p∗, so the actual update rule ~φ also converges to its fixed point p∗ (potentially faster than ¯¯¯¯φ). This concludes the proof.

## Appendix B. Proof of Proposition 11

We will use a version of the strong law of large numbers for uncorrelated random variables with bounded variance, adapted from Neely (2021, Theorem 2).

Theorem 17.Let{Xt}t∈N0be a sequence of pairwise uncorrelated random variables with mean0and bounded variances. I.e., assume thatE[Xt]=0

for allt∈N0There existsc>0such thatVar(Xt)≤cfor allt∈N0Cov(Xt,Xt′)=0

for allt≠t′∈N0.

limT→∞1TT∑t=1Xt=0.Then almost surelyWe will apply this law to random variables Xt:=S(P′t,Yt)−S(P′t,f(Pt)), where P′t is either Pt or f(Pt).

First, by Proposition 10, E[Xt]=E[S(P′t,Yt)−S(P′t,f(Pt))]=0. Second, by assumption,

suptVar(S(P′t,f(Pt)))<∞.Hence, also

suptVar(S(P′t,f(Pt)))=suptVar(E[S(P′t,f(Pt))∣Pt])≤suptVar(S(P′t,f(Pt)))<∞.It follows that also supt