XelaP comments on The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists

XelaP 5 Mar 2026 10:21 UTC
3 points
0
I think this post suffers from trying to do two things at once, when it would have done better had it chosen one. A lot of it is written for an audience that is basically coming into the AI-safety stuff for the first time. This is perfectly fine! The problem is that it’s mixed in with the three theorems here, in a way that I expect won’t be followed by someone that’s new or nonmathy, and also that makes it really annoying for someone who wants to skip to seeing your theorems.
A second problem is… I think the theorems are kinda meh? As an example: theorem 2 is a pretty well known result, and pretty easy, and yet I feel like your presentation makes it quite hard to follow. Theorem 2 is that for a transitive complete relation, you can treat indifferent things as the same thing (that is, you can pass to equivalence classes under indifference), and this works because X ~ Y and Y ⇐ Z implies X ⇐ Z which is exactly the property you need for the relation to treat indifferent things the same (though, when nesting you also need nested indifference^[1]). Doing this would also help make theorem 1′s proof clearer—most of the length is caused by splitting transitivity into subproperties and not using the fact that you can treat indifferent lotteries the same for both preference and nesting.

As for content, I personally don’t feel like you’ve gained anything in Theorem 3 beyond the reasoning of “If you’re ever willing to spend anything early for gains later, then by completeness you see the possibility of either a gain or a loss from staying on/shutting off, and so will spend to prevent it”. Theorem 1 has a similar issue. This objection hits harder given how the presentation obscures the ideas.
I’ll lastly make a comment about nested indifference. If I’m understanding correctly, in the VNM framework it’s a special case of the reduction of nested gambling^[2], and this axiom is equivalent to independence of irrelevant alternatives (IIA). Imo IIA has a good money pump argument in addition to its intuitiveness, see for example Eliezer’s in the allais paradox, though I expect you won’t find that convincing if you weren’t already convinced of the property; likewise to me the intuition behind nesting is that you’re only really gambling over the final thing, the act of gambling shouldn’t be treated like a magical extra thing.
1. ^
  By this I mean your “Indifference of Indifference-Shifted Lotteries”
2. ^
  where ) is treated just like
- Elliott Thornley (EJT) 9 Mar 2026 22:35 UTC
  2 points
  0
  Parent
  Yeah I think the ‘doing two things at once’ is an issue, though my main intended audience for this paper is academic philosophers and decision theorists who are—as a rule—both mathy and new to AI safety stuff.
  Your other points sound kinda like a ‘Theorems are slow and detailed’ complaint to which I say: yes, but the detail helps guide our search for solutions. For example, it was thinking about Theorems 2 and 3-ish stuff that first got me thinking that incomplete preferences might help with shutdownability.
  I am convinced of Independence as a requirement of rationality, for paying-to-avoid-information and money-pump reasons (like Yudkowsky’s), plus I think the Allais preferences aren’t much evidence against. I went for Indifference Between Indifference-Shifted Lotteries because it’s a little weaker (though since writing the paper I’ve been convinced that it’s not significantly weaker. Basically every endorsed decision theory that violates Independence also violates IBISL).
  - XelaP 10 Mar 2026 5:38 UTC
    2 points
    0
    Parent
    I do not generally have a problem with slow and detailed theorems. I seem to have tastes different from yours about when theorems add understanding. But if it got you thinking about incompleteness, then it’s done its job.