Alright, as I’ve mentioned before I’m terrible at abstract thinking, so I went through the post and came up with a concrete example. Does this seem about right?
We are running a quantitative trading firm, we care about the closing prices of the S&P 500 stocks. We have a forecasting system which is designed in a robust, “prediction market” style. Instead of having each model output a single precise forecast, each model Mj(for j=1,…,k)
outputs a set of probability distributions over tomorrow’s closing prices. That is, for each action a (for example, “buy” or “sell” a particular stock, or more generally “execute a trade decision”), our model returns a set
Mj(a)⊆Δ(FuturePrices),Mj(a)⊆Δ(Future Prices),
which means that Mj(a) is a nonempty, convex set of probability distributions over outcomes.
This definition allows “nature” (the market) to choose the distribution that actually occurs from within the set provided by our model. In our robust framework, we assume that the market might select the worst-case distribution in Mj(a). In other words, our multivalued model is a function
M:A→□(O),M:A→□(O),
with □(O) representing the collection of sets of probability distributions over outcomes. Instead of pinning down exactly what tomorrow’s price will be if we take a particular action, each model provides us a “menu” of plausible distributions over possible closing prices.
We do this because there is some hidden-to-us world state that we cannot measure directly, and this state might even be controlled adversarially by market forces. Instead of trying to infer or estimate the exact hidden state, we posit that there are a finite number of plausible candidates for what this hidden state might be. For each candidate hidden state, we associate one probability distribution over the closing prices. Thus, when we look at the model for an action, rather than outputting a single forecast, the model gives us a “menu” of distributions, each corresponding to one possible hidden state scenario. In this way, we deliberately refrain from committing to a single prediction about the hidden state, thereby preparing for a worst-case (or adversarial) realization.
Given a bettor B, MDP M and policy π we define betBM,π:([0,1]×(states×acts×[0,1])[H])→R, the aggregate betting function, as betBM,π(r0,s1,a1,...rH):=1+∑Hh=0(lbetB,h,sh,ahM,π(rh,sh+1,ah+1)−1) Where trt and πt are the trajectory in episode t and policy in episode t, respectively.
Next, our trading system aggregates these imprecise forecasts via a prediction-market-like mechanism. Inside our algorithm we maintain a collection of “bettors”. Each “bettor” (besides the pessimistic and uniform bettors, which do the obvious thing their names imply) corresponds to one of our underlying models (or to aspects of a model). Each bettor B is associated with its own preferred prediction (derived from its hypothesis set) and a current “wealth” (i.e. credibility). Instead of simply choosing one model, every bettor places a bet based on (?)how well our market prediction aligns with its own view(?),
Robust Universal Estimator (RUE) Parameters: Hypothesis class H, rounds T, reward function r, prior ζ1
ϵ←min(12,√ln(2)T)
Function estimate(ζ, a):
return argminμ∈ΔOEB∼ζ[ if B≠∙,2D2H(μ→MB(a)),else ε⋅Eo∼μ[r(a,o)]]
To calculate our market prediction ˆM, we solve a convex minimization problem that balances the differing opinions of all our bettors (weighted by their current wealth ζ) in such a way that it (?)minimizes their expected value on update(?).
The key thing here is that we separate the predictable / non-adversarial parts of our environment from the possibly-adversarial ones, and so our market prediction ˆM reflects our best estimate of the outcomes of our actions if the parts of the universe we don’t observer are out to get us.
Is this a reasonable interpretation? If so, I’m pretty interested to see where you go with this.
It’s roughly on the right track, but here are some inaccuracies in your description that stood out to me:
There is no requirement that the “hidden state space” is finite. It is perfectly fine to consider a credal set which is not a polytope (i.e. not a convex hull of a finite set of distributions).
The point of how market prices are computed, missing from your description, is that they prevent any bettor from making unbounded earnings (essentially, by making them bet against each other). This is the same principle as Garrabrant induction. In particular, this implies that if any of our models is true then the market predictions will converge to lying inside the corresponding credal set.
The market predictions do not somehow assume that “the parts of the universe we don’t observe are out to get us”. Thanks to the pessimistic better, they do satisfy the “not too optimistic condition”, but that’s “not too optimistic” relatively to the true environment.
Your entire description only talks about the “estimation” part, not about the “decision” part.
Alright, as I’ve mentioned before I’m terrible at abstract thinking, so I went through the post and came up with a concrete example. Does this seem about right?
We are running a quantitative trading firm, we care about the closing prices of the S&P 500 stocks. We have a forecasting system which is designed in a robust, “prediction market” style. Instead of having each model output a single precise forecast, each model Mj(for j=1,…,k)
outputs a set of probability distributions over tomorrow’s closing prices. That is, for each action a (for example, “buy” or “sell” a particular stock, or more generally “execute a trade decision”), our model returns a set
which means that Mj(a) is a nonempty, convex set of probability distributions over outcomes.
This definition allows “nature” (the market) to choose the distribution that actually occurs from within the set provided by our model. In our robust framework, we assume that the market might select the worst-case distribution in Mj(a). In other words, our multivalued model is a function
with □(O) representing the collection of sets of probability distributions over outcomes. Instead of pinning down exactly what tomorrow’s price will be if we take a particular action, each model provides us a “menu” of plausible distributions over possible closing prices.
We do this because there is some hidden-to-us world state that we cannot measure directly, and this state might even be controlled adversarially by market forces. Instead of trying to infer or estimate the exact hidden state, we posit that there are a finite number of plausible candidates for what this hidden state might be. For each candidate hidden state, we associate one probability distribution over the closing prices. Thus, when we look at the model for an action, rather than outputting a single forecast, the model gives us a “menu” of distributions, each corresponding to one possible hidden state scenario. In this way, we deliberately refrain from committing to a single prediction about the hidden state, thereby preparing for a worst-case (or adversarial) realization.
Next, our trading system aggregates these imprecise forecasts via a prediction-market-like mechanism. Inside our algorithm we maintain a collection of “bettors”. Each “bettor” (besides the pessimistic and uniform bettors, which do the obvious thing their names imply) corresponds to one of our underlying models (or to aspects of a model). Each bettor B is associated with its own preferred prediction (derived from its hypothesis set) and a current “wealth” (i.e. credibility). Instead of simply choosing one model, every bettor places a bet based on (?)how well our market prediction aligns with its own view(?),
To calculate our market prediction ˆM, we solve a convex minimization problem that balances the differing opinions of all our bettors (weighted by their current wealth ζ) in such a way that it (?)minimizes their expected value on update(?).
The key thing here is that we separate the predictable / non-adversarial parts of our environment from the possibly-adversarial ones, and so our market prediction ˆM reflects our best estimate of the outcomes of our actions if the parts of the universe we don’t observer are out to get us.
Is this a reasonable interpretation? If so, I’m pretty interested to see where you go with this.
It’s roughly on the right track, but here are some inaccuracies in your description that stood out to me:
There is no requirement that the “hidden state space” is finite. It is perfectly fine to consider a credal set which is not a polytope (i.e. not a convex hull of a finite set of distributions).
The point of how market prices are computed, missing from your description, is that they prevent any bettor from making unbounded earnings (essentially, by making them bet against each other). This is the same principle as Garrabrant induction. In particular, this implies that if any of our models is true then the market predictions will converge to lying inside the corresponding credal set.
The market predictions do not somehow assume that “the parts of the universe we don’t observe are out to get us”. Thanks to the pessimistic better, they do satisfy the “not too optimistic condition”, but that’s “not too optimistic” relatively to the true environment.
Your entire description only talks about the “estimation” part, not about the “decision” part.