Arjun Panickssery comments on Arjun Panickssery’s Shortform

Arjun Panickssery 28 May 2026 19:32 UTC
17 points
−4
Many LW people believe in one of a family of meta-ethical theories that don’t make sense to me.
1. Idealizing subjectivism (IdS): This theory says that X is intrinsically valuable, relative to an agent A, if and only if, and because, A would have some set of evaluative attitudes towards X, if A had undergone some sort of idealization procedure.” (definition from Joe Carlsmith who says “Idealizing subjectivism has been something like my best-guess meta-ethics. And lots of people I know take it for granted”).
2. Coherent extrapolated volition (CEV): Some say that ideally an AI should predict what people should want “if we knew more, thought faster, were more the people we wished we were, had grown up farther together.” The AI should use the desires that “converge” among everyone in some sense, but I also hear people talk about such-and-such person’s CEV (cf Habryka on “Vladimir Putin’s CEV”).
3. Ideal-observer theory (IOT): This is an academic theory that says that to say something is good is to say that an “ideal observer” would approve of it. Firth in “Ethical absolutism and the ideal observer” says this ideal observer should be omniscient with respect to natural facts, dispassionate, disinterested, consistent, “normal,” etc.
These are all framed slightly differently: IdS is an anti-realist theory of what to care about, CEV is about how to command an AI, and IOT is about what moral statements mean. But these theories don’t help with the hard problems with meta-ethics that they try to resolve or elide. In particular all of the theories based on an idealization procedure fail because either
1. The idealization procedure is taken to include moral knowledge, creating circularity, or
2. The idealization procedure only includes rationality in the making of non-moral judgments, knowledge of non-moral facts, etc, in which case this is a reductionist meta-ethics that doesn’t actually cross the is-ought gap (i.e. it remains an open question whether the idealized attitudes would be good).
I basically think these views are popular because while moral realism is not plausible, these idealization theories allow for crypto-realism where the exact same discussion is had but framed around this illusory target of our “idealized” selves, whose relevance or for whose actual convergence there isn’t any evidence.
- Steven Byrnes 2 Jun 2026 16:39 UTC
  11 points
  0
  Parent
  I mostly agree with this (see here). My meta-ethical stance is kinda more nihilism-adjacent when compared to Eliezer (& Nate, Habryka, etc.) who are more moral-realism-adjacent. For example they’ll casually refer to “the future’s potential value” as if it’s a meaningful metric that is canonical and characteristic of humanity as a whole, not just value-from-a-particular-person’s-perspective, nor value-relative-to-a-certain-semi-arbitrary-operationalization-of-the-details-of-CEV, etc.
  That said, we do face an issue that I happen to expect an ASI singleton in my lifetime, and its preferences will determine the future, for better or worse. Things like CEV / Long Reflection seem to have promise as political projects—like, flags that lots of people might feel motivated to rally around, because they all feel enthusiastic about the future that this would lead to, and which I personally also feel enthusiastic about (well, at least potentially, the details matter). They certainly seem less bad and unfair than lots of other options. Are the CEV / Long Reflection results well-defined and independent of arbitrary details of the deliberation process? My guess is: Probably not! But oh well, we have to do something, and there aren’t obviously better options.
  What links here?
  - Steven Byrnes's comment on Arjun Panickssery’s Shortform by Arjun Panickssery (2 Jun 2026 17:09 UTC; 5 points)
- Mitchell_Porter 29 May 2026 21:41 UTC
  7 points
  3
  Parent
  Eliezer’s moral realism is unabashedly anthropocentric in its justification. He says, humans have various decision-making dispositions (he gives the example of fair division of resources), some of which we might call moral intuitions, and that’s just what morality is; or morality is what you get when you “extrapolate” those moral intuitions, according to an idealization procedure which can be equally species-specific in its origin.
  It’s an interesting position because it escapes the usual framing of moral realism versus moral relativism, but also doesn’t say which natural decisions are moral and which are not. The second point is just that not every choice is a moral choice—some choices are aesthetically motivated, some by adherence to reality, some by fear or pain, and so on. This was on my mind when I wrote
  Could it be that a correct theory of human decision-making would say that there are multiple kinds of norms behind our decisions, and it’s a mistake to reduce it all to ethics?
  The implication for me is that CEV is not really just about creating an ideal moral agent. Its output is meant to be an idealization of the entire human decision procedure, which may have distinct rational, aesthetic, etc components (even including components that have never received a name in natural language) in addition to a strictly moral component.
  - Arjun Panickssery 29 May 2026 22:04 UTC
    7 points
    −3
    Parent
    But what is the value of these “dispositions”? I certainly have some dispositions; for example, my intuition is that Mt. Kilimanjaro is more beautiful than a random pile of garbage.
    This “extrapolation” concept assumes a bunch of stuff:
    That if everyone underwent an idealization procedure, they would find some kind of common ground
    That people should care, personally, about what their idealization procedure would produce
    My point is that no “idealization procedure” solves the hard problem, which is crossing the is-ought gap i.e. going from facts about the world or about your impressions and deriving moral principles.
    - habryka 30 May 2026 3:48 UTC
      4 points
      0
      Parent
      That if everyone underwent an idealization procedure, they would find some kind of common ground
      No, it just doesn’t assume that. It’s totally fine for different people to want different things, and for their extrapolated values to diverge, under Eliezer’s metaethics.
      That people should care, personally, about what their idealization procedure would produce
      Yes, it does assume this! But honestly, anything different from this seems kind of absurd. Clearly there are some actions you can take that make you think you will make better ethical judgements in the future. “Sleeping enough” is one such very boring action that I think practically everyone would endorse.
      It just seems like a very obvious fact that the preferences of basically all humans have idealization characteristics so that there are changes people could make to themselves that would make them want to defer to that changed version of themselves, instead of their current selves. Making all such changes is what CEV is. This doesn’t necessarily “solve” ethics, but it establishes at least one thing you clearly should do if you want to make progress on ethics.
      - Arjun Panickssery 30 May 2026 4:31 UTC
        6 points
        1
        Parent
        No, it just doesn’t assume that. It’s totally fine for different people to want different things, and for their extrapolated values to diverge, under Eliezer’s metaethics.
        Ah ok. I admit I don’t know much about CEV, compared to the other two listed items in my top-level post. This document admits (emphasis):
        Q9. How does the dynamic force individual volitions to cohere? (Frequently Asked)
        The dynamic doesn’t force anything. The engineering goal is to ask what humankind “wants,” or rather what we would decide if we knew more, thought faster, were more the people we wished we were, had grown up farther together, etc. “There is nothing which humanity can be said to ‘want’ in this sense” is a possible answer to this question. Meaning, you took your best shot at asking what humanity wanted, and humanity didn’t want anything coherent.
        It defines coherence as “Strong agreement between many extrapolated individual volitions which are unmuddled and unspread in the domain of agreement, and not countered by strong disagreement.” So while it is conceded, in passing, that there might not be a result, I assumed that Yudkowsky thinks it’s plausible, because otherwise it wouldn’t make sense to advocate for CEV as the target for AI alignment. (I guess it’s possible that he concluded that it would be better for an AI not to do anything in that case, as a safe failure mode, versus to act on a different alignment target.)
        Clearly there are some actions you can take that make you think you will make better ethical judgements in the future.
        This doesn’t address the is-ought gap. I agree that if you already accept moral realism then this kind of thing is a relevant consideration, but positing an idealization procedure doesn’t solve meta-ethics. Things like “sleeping enough” only corrects non-moral defects like fatigue but doesn’t address the question of whether the resulting judgments are objectively good. In contrast, the “be more the people you wished you were” in Yudkowsky’s idealization procedure introduces moral knowledge and values (insofar as “wishing” is an evaluative attitude), but that creates circularity.
        In particular all of the theories based on an idealization procedure fail because either
        The idealization procedure is taken to include moral knowledge, creating circularity, or
        The idealization procedure only includes rationality in the making of non-moral judgments, knowledge of non-moral facts, etc, in which case this is a reductionist meta-ethics that doesn’t actually cross the is-ought gap (i.e. it remains an open question whether the idealized attitudes would be good).
        My broader accusation is that this kind of talk is used for crypto-realism; people want to basically talk in terms of stance-independent moral facts. But they merely frame the discussion in terms of what their idealized self would believe, when in reality the idealization procedure is either circular or can’t cross the is-ought gap and introduce moral knowledge. You yourself just talked in terms of “progress on ethics” and “better ethical judgments” but by that you could mean either
        “Progress on figuring out what my idealized self would think / what judgments he would make”—how does this illuminate any metaethics?
        “Progress on figuring out objective, or stance-independent, ethics/judgments”—how would the idealized self be authoritative about that, especially if they diverge among people?
        habryka 30 May 2026 4:38 UTC
        2 points
        0
        Parent
        But they merely frame the discussion in terms of what their idealized self would believe, when in reality the idealization procedure is either circular or can’t cross the is-ought gap and introduce moral knowledge.
        I mean, in as much as morality is a thing at all, it’s bound by logical constraints. In order for preferences to make any sense, they must adhere to at least very basic logical constraints, and that alone admits for a huge amount of stance-independent reasoning.
        I like to generally speak of “moral axioms” and “moral inference rules” and then at least one kind of valid stance-independent reasoning you can do is to map out what conclusions you can infer from a set of moral axioms and moral inference rules.
        This of course doesn’t solve everything about ethics, but I feel like you clearly can’t deny the ability to do some amount of logical inference on top of your preferences.
        (And then this starts allowing saying generalized things about classes of moral axioms and classes of moral inference rules. You can talk about how likely it is for human morality to generally converge, in a similar way you can talk about different mathematical inference systems turning out to be equivalent, even if that doesn’t tell you which mathematical axioms are the “correct ones” to use.)
        What links here?
        Arjun Panickssery's comment on Arjun Panickssery’s Shortform by Arjun Panickssery (31 May 2026 1:11 UTC; 3 points)
        Arjun Panickssery 31 May 2026 0:54 UTC
        3 points
        1
        Parent
        I agree with everything in this response. In particular, I don’t mean to “deny the ability to do some amount of logical inference on top of your preferences.”
        My point is that it doesn’t answer the key metaethical question of why you ought to act according to any of those ideas.
        habryka 31 May 2026 6:51 UTC
        2 points
        0
        Parent
        I mean, because you are applying logical inferences on top of your existing oughts?
        As long as you grant that you ought to care about some things, and that you ought to care about things in any kind of coherent way, then you ought to care about the different things that are implied by the things you already ought to care about.
        But I feel like I am restating things here, so I might have misunderstood you.
        Steven Byrnes 2 Jun 2026 17:09 UTC
        5 points
        3
        Parent
        If you ask lots of people whether their moral preferences ought to be self-consistent, they’ll mostly say yes. If you ask lots of people whether their moral preferences are more valid after they think about them longer, after a good night’s sleep, they’ll also mostly say yes.
        But also, if you ask lots of people whether it’s moral for their family to be tortured, they’ll mostly say no. And they probably won’t say that no-torture is less important than self-consistency.
        Here are three (IMO reasonable) people arguing that moral deliberation / self-consistency does not straightforwardly and universally trump other ways to reach normative conclusions: Scott Alexander:
        But I’m not sure I want to play the philosophy game. Maybe MacAskill can come up with some clever proof that the commitments I list above imply I have to have my eyes pecked out by angry seagulls or something. If that’s true, I will just not do that, and switch to some other set of axioms. If I can’t find any system of axioms that doesn’t do something terrible when extended to infinity, I will just refuse to extend things to infinity.
        plus Stuart Armstrong here, and Joe Carlsmith discusses this a bunch (kinda arguing both sides) here & here & here.
        Anyway, if we’re gonna treat CEV (and related things like Long Reflection) as meta-ethical ground truth (and not just as pragmatic projects to design a widely-acceptable ASI motivation system, per my other comment), then we have to grant moral deliberation and self-consistency a special status, NOT just “well yeah self-consistency is one of the things that people feel is good and right, along with all the other things that people feel are good and right”. And I think Arjun is asking: where would this special status come from?
        It’s evidently not grounded in people’s moral intuitions, because people’s moral intuitions in favor of self-consistency are not systematically stronger or different-in-kind from people’s moral intuitions in favor of justice or whatever else. Alternatively, if we want to ground it in, like, “well they’d appreciate the value of self-consistency if they thought about it more”, then that’s circular question-begging, because it’s already granting a special status to deliberation.
        habryka 2 Jun 2026 17:40 UTC
        3 points
        0
        Parent
        I think you are probably misinterpreting me here, though the domain is tricky, so that’s understandable.
        I advocate that you only take the steps towards consistency that are endorsed. There are really quite a lot of those! This does not require giving (apparent) logical consistency some kind of supremacy. Indeed, I would strongly argue against the kind of philosophy that MacAskill tends to do, and don’t think it really has much to do with the thing that I expect to happen during CEV.
        The way I usually phrase it is that you list all the interventions that you could make to your beliefs and brain, and you start doing the ones that seem the most robust under really any viewpoint (e.g. something like “make sure to get enough sleep”). Then you work your way down the list, very conservatively taking actions or propagating beliefs that seem less reversible or robust.^[1]
        I think the default outcome of this maximally conservative approach is that you still end up somewhere extremely different from where you started, and it doesn’t really require giving self-consistency some kind of dominating overriding status where someone gives you a clever argument with horrifying conclusions and then you have to accept it. Indeed, not accepting those arguments seems extremely wise to me.
        Yes, this does require some degree to which my moral beliefs are subject to consistency, but of course, they would have no meaning at all if they were not at least subject to some minimal levels of consistency.
        A preference needs to ground in reality somehow, and for the things over which you have preferences to “be real” in some meaningful sense. And the subject of this conversation is the kind of preference that makes sense for humans to endorse and make plans around. A bundle of local-minimization urges does not write internet comments, or thinks about what they would like a future AI system to do with them, or cares about “metaethics” at all.
        ^
        This would reasonably also include things like “make a copy of yourself that you give veto power to that you check in with after you’ve gone down a path of self-reflection and self-modification”.
        Steven Byrnes 3 Jun 2026 13:39 UTC
        2 points
        0
        Parent
        That all sounds fine, if we’re engaged in a pragmatic project for deciding what to do, and want to propose an answer that you and I can get behind, and that lots of people around the world can also get behind.
        I think Arjun is (rightly) complaining about something different, namely that Eliezer and you and others frequently slip into treating this answer as being fundamentally privileged / “Right”, as opposed to merely a pragmatic option that you and I and lots of people can get behind.
        E.g. here’s Nate referring to “the future’s potential value”, as if there’s a metric for that which is canonical and characteristic of humanity-as-a-whole. I think that’s moral-realist (or “crypto”-moral-realist) thinking, sneaking in.
        Expand this thread
        habryka 3 Jun 2026 17:52 UTC
        3 points
        1
        Parent
        Hmm, I don’t really get this. Or like, I am about as sympathetic to this argument as someone saying “E.g. here’s Nate referring to ‘the future’ as a thing that exists, as if there is consensus on there being a single reality and arrow of time. I think that’s scientific materialist thinking sneaking in, denying the possibility of solipsism or simulationism”. To which my reaction is “yes, metaphysics is actually quite confusing, but come on man, you know what I mean, in as much as words mean anything, this is a fine use of them”.
        Similarly here, my reaction is: “Come on man, you know what Nate means. In as much as ‘preferences’ mean anything, there is an up-direction for humanity as a whole, and a down-direction for humanity as a whole, even without any kind of substantial convergence, given how far away we are from the Pareto frontier from anything”.
- cubefox 29 May 2026 22:36 UTC
  3 points
  0
  Parent
  Yudkowsky’s Extrapolated volition (normative moral theory) is straightforwardly moral realist in the standard philosophical terminology. It is very similar to Frank Jackson’s Analytical Functionalism, a fact which he explicitly acknowledged in the above article (and more recently in passing here).
  - Arjun Panickssery 30 May 2026 4:43 UTC
    3 points
    1
    Parent
    This doesn’t really address my objection but just labels it.
    If I understand correctly, Yudkowsky merely asserts that real moral knowledge is found by
    running a certain logical function over possible states of the world, where this function is analytically identical to the result of extrapolating our current decision-making process in directions such as “What if I knew more?”, “What if I had time to consider more arguments (so long as the arguments weren’t hacking my brain)?”, or “What if I understood myself better and had more self-control?”
    But this is an idealization procedure, and so it falls into my dichotomy:
    In particular all of the theories based on an idealization procedure fail because either
    The idealization procedure is taken to include moral knowledge, creating circularity, or
    The idealization procedure only includes rationality in the making of non-moral judgments, knowledge of non-moral facts, etc, in which case this is a reductionist meta-ethics that doesn’t actually cross the is-ought gap (i.e. it remains an open question whether the idealized attitudes would be good).
    I don’t see a clear moral/evaluative claim baked into the listed examples there, so therefore it maintains the problem of explaining why the outcome of the idealization procedure is actually good and why you ought to care about it, i.e. crossing the is-ought gap.
    (My objection is similar or maybe the same to the open-question objection to analytic naturalism, of which analytic functionalism is one type.)
    - cubefox 30 May 2026 12:43 UTC
      4 points
      2
      Parent
      Yudkowsky replies to the open question argument here.
      
      I will add that the open question argument with respect to analytic naturalism, including Jackson’s and Yudkowsky’s theories, is just an instance of the paradox of analysis, which states that any proposed conceptual analysis is either true but trivial, or non-trivial but false. I’d reply that the solution to this paradox is that knowing a concept (understanding the meaning of a word) does, as a psychological matter of fact, not imply that we know how to define it. We only intuitively know how to use a word, but that doesn’t include the ability to easily state exactly how it relates to other concepts. Which is why the process of conceptual analysis (analytic philosophy) is not a trivial task. So “action x is right” can mean (be analytically equivalent to) something like “x is conducive to our coherent extrapolated volition” without this being a trivial semantic fact.
      
      Regarding the “is-ought gap”: an “ought sentence” can be straightforwardly transformed into an “is sentence”: “I ought to do x” ≈ “Doing x is right”.
      
      The non-triviality of analysis should be very familiar to anyone who has done a bit of philosophy. For example, what does it mean to say that “belief x is rational”? A conceptual analysis of epistemic rationality is highly non-obvious. Yet few people assume that there are no objective facts about what makes some beliefs rational or irrational, or that these objective facts would have to be ontologically suspect entities, or that any analysis would have to be circular or fail to bridge “the descriptive/normative gap”.
      - Arjun Panickssery 31 May 2026 1:11 UTC
        3 points
        1
        Parent
        This is similar to @habryka’s reply here where I agree with the statements in the reply but I don’t think they respond to my objection.
        If I understand your two points correctly they are that
        An open-question critique of the idealization-procedure definition can be applied to any conceptual analysis. Yes, sure. (Irrelevant but I also don’t think the analysis of concepts is very useful.)
        There is no is-ought gap because an “ought sentence” can be rephrased as an “is sentence.”
        But these only address a weak “semantic” interpretation of my objection to the analysis when what I am questioning is why the proposed analysis produces normative authority. My complaint isn’t the general complaint that to define the good as the product of an idealization procedure is either trivial or false, but that there’s this actual thing (normative authority) that isn’t addressed. Likewise with (2), you can certainly rephrase an “ought sentence” into an “is sentence” but that doesn’t change it from a normative to a descriptive claim.
        My question is about how an idealization procedure (like extrapolated volition or whatever else) can actually have moral authority if the whole procedure is specified in non-normative terms.
        cubefox 31 May 2026 5:13 UTC
        2 points
        0
        Parent
        I would dispute the existence of an actual is/ought or descriptive/normative gap. If “I ought to do x” (a normative sentence) is semantically equivalent to “doing x is right”, and “doing x is right” is semantically equivalent to “x is conducive to our coherent extrapolated volition”, and the latter has a straightforward “descriptive” truth value, then “I ought to do x” has the same truth value. In which case there is no fundamental difference between descriptive and normative sentences; the supposed gap was just an illusion stemming from the superficially different sentence structure of “ought” and “is” sentences and from the apparent difficulty of defining terms like “right”.
        
        (For clarity, I should also point out that believing “I ought to do x” (or “x is right”) does not imply “I’m motivated to do x”. See here. In particular, a psychopath can believe that various things are morally wrong while not being motivated at all to avoid doing the things he believes to be wrong. Most normal people have some degree of altruistic desires, but the correlation between moral beliefs and altruistic motivation is far from perfect. Various people believe eating meat is wrong without having significant motivation to stop eating meat.)
- hypnosifl 30 May 2026 17:59 UTC
  2 points
  1
  Parent
  For believers in scientific reductionism, moral realism based on a priori knowledge or fixed “human nature” or mental access to a realm of platonic moral truths is not plausible. But I would argue that people inclined to think in terms of very long (possibly infinite) transhuman futures should be more open to a form of moral realism based on the pragmatist philosopher C.S. Peirce’s limit concept of truth, where objective truth is understood as that which a very long-lived “community of inquiry” would tend to converge on with probability 1 in the limit of infinite time to discuss and experiment (this can be elaborated in terms of the idea of societal belief systems having long-term dynamical attractors, see this paper which interprets Peirce’s concept in this way).
  In the moral realm, it may be that a combination of memetic and biological evolution would tend to cause strong convergence on certain norms in the long term, perhaps because individuals can see the consequences of different norms in different subcultures and some may be more universally appealing, and/or because certain norms are more conducive to the continual growth of knowledge (David Deutsch suggested something like the latter in chapter 14 of his book The Fabric of Reality, and Peirce apparently had limited discussion of ethics but this section of the Internet Encyclopedia of Philosophy entry on Peirce’s ‘Architectronics’ says that ‘This makes ethics, for Peirce, a question of what kind of conduct is likely to see the growth of reason or rationality’). This could be compatible with both ideal observer theory (understood in terms of general limit observers rather than just an idealized version of our own idiosyncratic perspective) and the “convergent” version of coherent extrapolated volition.
  - Arjun Panickssery 30 May 2026 19:48 UTC
    3 points
    1
    Parent
    For believers in scientific reductionism, moral realism based on a priori knowledge or fixed “human nature” or mental access to a realm of platonic moral truths is not plausible
    Sure, but this is a case for nihilism or similar views.
    In the moral realm, it may be that a combination of memetic and biological evolution would tend to cause strong convergence on certain norms in the long term
    Sure, but
    this doesn’t explain where the moral authority comes from, i.e. why you ought to follow the principles that could result from this process
    in particular, the specific “evolutionary” formula invites an evolutionary debunking, because the theory of natural selection suggests that we converge on moral principles that tend to produce persistent societies or genes or similar, rather than ones which are morally good
    a few of your points reference the “growth of knowledge” or “growth of reason or rationality” but I don’t see why (1) the described idealization procedure points toward those things or (2) why those things are good