sl6928@princeton.edu
Axiomatic Decision Theory
Shuo Li
I assign P(doom) ∈ [1%, 5%]. Well below the median estimates. I believe in novel governance mechanisms will drive human-AI coexistence, which I open to discuss later. Yet, even being very optimistic, I think current capital allocation for ASI risk fails standard expected utility reasonings.
Taking annual global GDP (111T) as the baseline valuation (V) for macro-scale resource allocation, the expected loss evaluates directly:
E[L] = P(doom) × V
E[L] ∈ [1.11T, 5.55T]
A rational, risk-neutral EU dictates a budget of 1.1T or more. However, aggregate global spending on x-risk is under $1B. This is a structural mispricing of over three orders of magnitude.
Your proposed budget expansion is sound. I want to stress that the field is still heavily undercapitalized even if it is expanding.
Savage’s Axioms Make Dominated Acts Expected Utility Maxima
Savage’s Theorem says nothing about dominated strategies or vulnerability to exploitation.
This is conceptually spot on! I want add that, the formal situation for Savage’s Theorem is actually worse than not guaranteeing the dominance requirement. My collegue and I just published a decision theoretic proof that Savage’s axioms forces the agent into strict dominance failures.
Specifically, for every Savage EU satisfying the Savage’s axioms, under the axiom of constructibility, you can construct an act that is strictly dominated by across an entire positive-probability event, yet Savage’s axioms mathematically mandate that . So not only does the desire to avoid dominated strategies fail to entail EU (as you argued), but successfully achieving EU optimization via Savage’s axioms implies that agents are blind to strict dominance.
Moreover, we show that a simple Dutch Book argument is avaible for the Savage’s EU.
A detailed discussion is pending LW approval: https://www.lesswrong.com/posts/8ppB4ixfoKdGDqeHf/savage-s-axioms-make-dominated-acts-eu-maxima-9
The full proof can be found here: https://drive.google.com/file/d/15BHxSUR93bN5DQ5spU971gEpGGq6PHRc/view
Nice paper! I’m curious how the safety case handles is monitor staleness. The paper seems to rely on monitors’ ability being close to the generator. But in real deployment we might have
monitoring . Here is a later checkpoint / fine-tune / post-training step.I guess this is less about honeypot slack. It is more about whether the elicited ROC curves are stable under version gap. If those curves degrade quickly, the advantage of untrusted monitoring might only hold in an almost stationary case.
Has anyone tried the obvious eval here: freeze UM at checkpoint i, continue improving UG to checkpoint j>i, and then measure how collusion and suspiciousness ROC shift with j-i? If that degrades fast, it seems like monitor freshness needs to be added as a deployment parameter in the safety case.
Hi! Thank you so much for this thoughtful comment. It’s great connecting someone who has spent time deep in the weeds of Savage. It took me a long time to understand St. Petersburg, Banach-Tarski, and Villegas-Debreu: definitely not easy!
Loosely speaking, the tension here comes down to two ingredients implied by Savage’s axioms:
Universal domain (all possible mappings from states to outcomes are valid acts).
Expected Utility (EU)
I agree Countable Additivity (CA) is the workhorse of most fields, including empirical ML. The exceptions are decision theorists and philosophers, who remain enthusiastic about the assumption of the Universal Domain. I personally think preserving the Universal Domain is specifically important for the philosophy of AGI decision-making: an AGI, by definition, should be general and capable of evaluating all acts.
Finite Additivity (FA) is a consequence of the Universal Domain alongside other assumptions. Savage himself noted that his probability measures have to be FA. In our work, we further prove that Savage’s probability measures must also be locally strongly finitely additive under the axiom of constructibility.
As you pointed out, Villegas (and Debreu) provide the axiomatic foundation of Subjective Expected Utility with Countable Additivity by abandoning the universal domain, and our impossbility does not apply.
To summarize the theoretical landscape of what is and isn’t possible:
Non-universal domain + EU + CA + Strong Dominance: Totally possible (and widely used).
Universal Domain + EU + CA: Not mathematically possible.
Universal Domain + EU + FA + Strong Dominance: Not possible (our result).
Universal Domain + non-EU + FA + Strong Dominance: Explored by Machina and Schmeidler (1992). To my knowledge, it is unknown if Strong Dominance is compatible here.
Universal Domain + non-EU + CA + Strong Dominance: I believe this is also an open problem. For related thought, see Gul and Pesendorfer (2014) Expected Uncertain Utility.
Thanks again for engaging so deeply!