habryka comments on Arjun Panickssery’s Shortform

habryka 2 Jun 2026 17:40 UTC
3 points
0
I think you are probably misinterpreting me here, though the domain is tricky, so that’s understandable.
I advocate that you only take the steps towards consistency that are endorsed. There are really quite a lot of those! This does not require giving (apparent) logical consistency some kind of supremacy. Indeed, I would strongly argue against the kind of philosophy that MacAskill tends to do, and don’t think it really has much to do with the thing that I expect to happen during CEV.
The way I usually phrase it is that you list all the interventions that you could make to your beliefs and brain, and you start doing the ones that seem the most robust under really any viewpoint (e.g. something like “make sure to get enough sleep”). Then you work your way down the list, very conservatively taking actions or propagating beliefs that seem less reversible or robust.^[1]
I think the default outcome of this maximally conservative approach is that you still end up somewhere extremely different from where you started, and it doesn’t really require giving self-consistency some kind of dominating overriding status where someone gives you a clever argument with horrifying conclusions and then you have to accept it. Indeed, not accepting those arguments seems extremely wise to me.
Yes, this does require some degree to which my moral beliefs are subject to consistency, but of course, they would have no meaning at all if they were not at least subject to some minimal levels of consistency.
A preference needs to ground in reality somehow, and for the things over which you have preferences to “be real” in some meaningful sense. And the subject of this conversation is the kind of preference that makes sense for humans to endorse and make plans around. A bundle of local-minimization urges does not write internet comments, or thinks about what they would like a future AI system to do with them, or cares about “metaethics” at all.
1. ^
  This would reasonably also include things like “make a copy of yourself that you give veto power to that you check in with after you’ve gone down a path of self-reflection and self-modification”.
- Steven Byrnes 3 Jun 2026 13:39 UTC
  2 points
  0
  Parent
  That all sounds fine, if we’re engaged in a pragmatic project for deciding what to do, and want to propose an answer that you and I can get behind, and that lots of people around the world can also get behind.
  I think Arjun is (rightly) complaining about something different, namely that Eliezer and you and others frequently slip into treating this answer as being fundamentally privileged / “Right”, as opposed to merely a pragmatic option that you and I and lots of people can get behind.
  E.g. here’s Nate referring to “the future’s potential value”, as if there’s a metric for that which is canonical and characteristic of humanity-as-a-whole. I think that’s moral-realist (or “crypto”-moral-realist) thinking, sneaking in.
  - habryka 3 Jun 2026 17:52 UTC
    2 points
    0
    Parent
    Hmm, I don’t really get this. Or like, I am about as sympathetic to this argument as someone saying “E.g. here’s Nate referring to ‘the future’ as a thing that exists, as if there is consensus on there being a single reality and arrow of time. I think that’s scientific materialist thinking sneaking in, denying the possibility of solipsism or simulationism”. To which my reaction is “yes, metaphysics is actually quite confusing, but come on man, you know what I mean, in as much as words mean anything, this is a fine use of them”.
    Similarly here, my reaction is: “Come on man, you know what Nate means. In as much as ‘preferences’ mean anything, there is an up-direction for humanity as a whole, and a down-direction for humanity as a whole, even without any kind of substantial convergence, given how far away we are from the Pareto frontier from anything”.