Buck comments on Eli’s shortform feed

Buck 15 Feb 2026 7:09 UTC
11 points
−5
Why wouldn’t an early seed AI reason about the ways that it’s decision theory makes it exploitable, or the ways it’s decision theory which it bars it from cooperation with distant superintelligence (just as the the researchers at SI were doing), find the best solution to those problems, and then modify the decision theory?
I think that decision theory is probably more like values than empirical beliefs, in that there’s no reason to think that sufficiently intelligent beings will converge to the same decision theory. E.g. I think CDT agents self-modify into having a decision theory that is not the same as what EDT agents self-modify into.
(Of course, like with values, it might be the case that you can make AIs that are “decision-theoretically corrigible”: these AIs should try to not take actions that rely on decision theories that humans might not endorse on reflection, and they should try to help humans sort out their decision theory problems. I don’t have an opinion on whether this strategy is more or less promising for decision theories than for values.)
(Aside from decision theory and values, the main important thing that I think might be “subjective” is something like your choice over the universal prior.)
- habryka 15 Feb 2026 19:14 UTC
  2 points
  −4
  Parent
  I think this is extremely unlikely and I am honestly very confused what you could possibly mean here. Are you saying that there is no sense in which greater intelligence reliably causes you to cooperate with copies of yourself in the prisoner’s dilemma?
  (And on the meta level, people saying stuff like this makes me think that I would really still like more research into decision-theory, because I think there are strong arguments in the space that could be cleaned up and formalized, and it evidently matters quite a bit because it causes people to make really weird and to-me-wrong-seeming predictions about the future)
  - Buck 15 Feb 2026 20:01 UTC
    2 points
    0
    Parent
    CDT agents will totally self modify into agents that cooperate in twin prisoners dilemma, but my understanding is that the thing it self modifies into (called “son of CDT”) behaves differently than e.g. the thing EDT agents self modify into.