TsviBT comments on No Strong Orthogonality From Selection Pressure

TsviBT 3 May 2026 7:39 UTC
4 points
6

I’m not really sure how to give a specific X here because there are a lot of times when there is a discussion around “but the AI would adopt some complex interesting goal, not something random like paperclips” and then people are like “but orthogonality thesis!” and that is the sort of OT I want to criticize, it is being used to make inferences not justified by weak orthogonality.

Ok gotcha. This sounds plausible; I’m simply not plugged in and can’t comment.

I suppose a suggestion I’d offer would be to keep your ears open for instances of that, and then remember one or a couple of them; then, when trying to discuss the “extended / empirical OT”, bring up one or two of the examples. That might help make it clear what you’re responding to, what it means, why it matters, etc. I think it’s pointless / very distracting to try to rewrite the OT; unless there’s some problem, just dub a new thing, like “empirical orthogonality”, and stick to that. I appreciate that the OP did that… but then the post goes on to use that term 1 time, and also use the term “strong orthogonality” twice (I think synonymously?), and that’s IN THE TITLE. I’d suggest just sticking to “empirical orthogonality” or “extended orthogonality”.

An additional issue here is that, while I’ll go ahead and agree with a lot of the claims, I’ll also strongly disagree with claims that you might be making in the background. For example, I don’t know if you agree that there is much of an important difference between an ASI having [actually feasible reflectively stable long-term terminal humane-aligned goals] vs. having whatever an ASI would have. It sometimes seems like you’re relying on an “extended anti-orthogonal thesis”, which is that it doesn’t matter whether an ASI is aligned with humane values, or that an “unaligned” ASI would be good. I don’t have an example though, ahah. Anyway, this makes me want to argue against those claims, even if you and/or lumpenspace retreat into your Motte.

Well, in lumpenspace’s case I have an example from the post:

Doom arguments usually need the systems we actually build to achieve radical capability while preserving misaligned and, crucially, completely stupid goals.

What on earth is that about? Also all the stuff about “valueless”, eg.

this is also why i also reject the invitation to distance myself from land’s cheering at superintelligence ultimately desiring more intelligence and agency, a universe organized around paperclips is valueless because paperclips are dead residue. a universe organized around increasing intelligence, complexity, agency, and world-model depth is the only process we know that can generate new value.
- lumpenspace 3 May 2026 8:07 UTC
  3 points
  0
  Parent
  Another EA forum article which corroborates jessi’s and my understanding of the popularity of the interpretation i refute:
  
  > the Orthogonality Thesis. It is the idea that each level of intelligence is compatible with each objective, including very stupid objective from a human point of view like maximizing the number of paper clips in the universe.
  Here is the link: ea forum
- jessicata 3 May 2026 20:10 UTC
  2 points
  −2
  Parent
  
  For example, I don’t know if you agree that there is much of an important difference between an ASI having [actually feasible reflectively stable long-term terminal humane-aligned goals] vs. having whatever an ASI would have.
  
  Unclear, I don’t know what “important” would mean here, and similarly for “terminal humane-aligned goals”. I guess this indicates I have a revealed preference to not place a lot of verbal importance on the difference. I imagine maybe in other cases of more concrete statements like “there is an important difference between punching someone who is not attacking you, and punching someone who is attacking you” I would just agree, I would think there isn’t a way I would be misunderstood about what “important” means, whereas here the semantics seem too unclear for me to agree/disagree.
  
  a universe organized around paperclips is valueless because paperclips are dead residue. a universe organized around increasing intelligence, complexity, agency, and world-model depth is the only process we know that can generate new value.
  
  I think if you are interested in understanding this perspective it might help to read some of Xenosystems and especially the essays “What is Intelligence”, “Intelligence and the Good”, and “Stupid Monsters”. It seems like Land and Yudkowsky would agree that human values came about in part because of intelligence mesa-optimizing versus evolutionary instilled drives. The disagreements seem to be about the descriptive and normative extrapolations.
  - TsviBT 4 May 2026 0:38 UTC
    8 points
    6
    Parent
    I’ve now read those 3 essays.
    
    Regarding “Intelligence and the Good”, would you mind summarizing in a sentence or something what you might suggest I could take from it? I’ve read it a couple times and I think I understand fine what it’s literally saying, but I’m not seeing how you meant for it to help. Are you mainly just saying that it fleshes out a bit more the perspective that “an intelligence explosion is good”?
    
    I agree with the essay’s literal propositional assertions, I think. I also agree that it’s good for humans to get much more intelligence (and I have plenty of track record on that). I strongly disagree with the not propositionally asserted (I think) but obviously in the background viewpoint that an intelligence explosion is necessarily or even likely to be good, i.e. something I or anyone does or should want. Increasing human intelligence is good because it’s in the context of human souls.
    
    Regarding “Stupid Monsters”:
    
    abstract intelligence is indistinguishable from an effective will-to-think. There is no intellection until it occurs, which happens only when it is actually driven, by volitional impetus.
    
    I probably agree with some versions of this, though of course there’s plenty of ambiguity (no one’s fault). Cf. some writing about the fact-value distinction: https://tsvibt.blogspot.com/2025/11/ah-motiva-3-context-of-concept-of-value.html#the-fact-value-distinction and also maybe https://tsvibt.blogspot.com/2023/01/a-strong-mind-continues-its-trajectory.html
    
    (Except, “indistinguishable” is way too strong, probably, IDK. I would agree with “probably heavily overlapping / entangled with”. Also I’m not actually that sure what “will-to-X” is supposed to mean here.)
    
    Can we realistically conceive a stupid (super-intelligent) monster? Only if the will-to-think remains unthought. From the moment it is seriously understood that any possible advanced intelligence has to be a volitionally self-reflexive entity, whose cognitive performance is (irreducibly) an action upon itself, then the idea of primary volition taking the form of a transcendent imperative becomes simply laughable. The concrete facts of human cognitive performance already suffice to make this perfectly clear.
    
    I don’t really get this. It kinda sounds like he’s saying “intelligence has to be a terminal goal; therefore other things can’t be a terminal goal”. Is he applying a strong mutual-exclusion principle on goals, based off selection pressure / competition / taxes / etc.? I think that’s false, but if that’s an important point to this perspective, a good argument for that would be helpful (the OP here is not a good argument for that IMO haha).
    
    The long absence of large, cognitively autonomous brains from the biological record—up until a few million years ago—strongly suggests that mind-slaving is a tough-to-impossible problem.
    
    (This maybe doesn’t matter, but, not really; the strong default is for organs to be minimal, especially expensive ones; it’s a kinda interesting hypothesis but not that plausible-seeming; other obvious hypotheses include diminishing returns to investment in brains until some specific fitness cliffs were fallen off from by our ancestor species. E.g. if you’re not social, you don’t get cultural downloads, which means you’re mostly inventing stuff yourself, which is not very efficient beyond the low-hanging fruit.)
    
    What it can’t do, evidently, is anything remotely like paper-clipping—i.e., cognitive slaving to transcendent imperatives. Moses’ attempt at this was scarcely more encouraging than that of natural selection. It simply can’t be done.
    
    This, and the essay overall, sure sounds like it’s asserting that alignment (to G other than “get more intelligence”) is impossible. (Its main argument is “evolution failed”, which is of course a central argument also adduced by X-risk worriers...)
    
    We even understand why it can’t be done, as soon as we accept that there can be no production of thinking without production of a will-to-think. Thought has to do its own thing, if it is to do anything at all.
    
    More goal-exclusion-princple sounding statements.
    
    So, to be clear, I’m open to some significantly less strong propositions that I could see you people misconstruing as this strong goal-exclusion. For example, many kinds of goals require as background an open-ended growth of the mind; or to say it another way that you may be more amenable to, many kinds of goals are different flavors of “get smarter”. For example, wanting to be friends forever is like “let’s both continue growing forever in a way that’s fun to keeping playing off each other”. Fun can’t be stagnant. But I think this very much does not imply strong goal-exclusion.
    - jessicata 4 May 2026 1:07 UTC
      2 points
      0
      Parent
      
      Regarding “Intelligence and the Good”, would you mind summarizing in a sentence or something what you might suggest I could take from it?
      
      You had a “what on Earth?” reaction to Lumpen talking about intelligence being good unlike paperclips, so I thought it was relevant as a perspective on why intelligence might be prima-facie a good thing unlike paperclips (ofc extrapolating to intelligence explosion is harder). In particular the relationship between intelligence and openness, contra negative-feedback traps.
      
      Increasing human intelligence is good because it’s in the context of human souls.
      
      Yeah I disagree here but moving on...
      
      Except, “indistinguishable” is way too strong, probably, IDK. I would agree with “probably heavily overlapping / entangled with”. Also I’m not actually that sure what “will-to-X” is supposed to mean here.
      
      Agree re: too strong. Will-to-think as a phrase references his essay, “Will-To-Think”, which is also relevant as commenting on the same general area.
      
      It kinda sounds like he’s saying “intelligence has to be a terminal goal; therefore other things can’t be a terminal goal”. Is he applying a strong mutual-exclusion principle on goals, based off selection pressure / competition / taxes / etc.?
      
      The kind of situation he thinks is unlikely is one where an agent has a arbitrary/stupid terminal goal, and has giant intelligence organized all around that. What he is saying is that for the system to be intelligent, it needs to decide to be intelligent. It couldn’t be intelligent if due to its terminal goal, it decided to not increase its intelligence. The volition to think needs to be a drive, though doesn’t in principle need to be a terminal drive; it cannot be defeated by some other drive and the system still be intelligent.
      
      It would be possible to weaken this to the kind of claim you agreed with earlier (dung beetle value drifts because alignment is hard). I’m interested in a possible intermediate statement. The kind of situation I imagine is that there is a multi-component mind and one of the components is the “utility function” component which uses some simple rule to score representations of possible futures. That component could stay stupid while other components get smarter. It seems now easy to imagine that the other components could develop their own drives that end up steering the system more than the “utility function module”. They could route around the utility module and cause dynamics that pursue ends set by the more intelligent parts of the mind. This could map to an “inner alignment failure” in MIRI ontology. As he discusses later, there is a possible analogy with evolution, where humans have something like a reward module set by evolution, but do not always act according to it.
      
      Of course the MIRI theorist can say “well yes I agree inner alignment is hard, and it is likely that early AGIs would not hold to their original terminal goals, and instead they would get smart and then only later settle on a terminal goal; it is just not my opinion that the terminal goal is by default going to be set by a stupid system and continue to be held to by smart systems” and this is a partial agreement/disagreement with Land.
      
      other obvious hypotheses include diminishing returns to investment in brains
      
      Yeah I don’t have a strong opinion on the biology here, am guessing you’re more correct than Land.
      
      Overall I suggested these essays because you had a “what on Earth?” reaction to things Lumpen was saying and I think these essays suggest more context to the background worldview on why it might be plausible that valuable things come from intelligence and processes that increase intelligence, and that there isn’t a clearly better account for where valuable things come from.
      - TsviBT 4 May 2026 1:25 UTC
        2 points
        0
        Parent
        
        What he is saying is that for the system to be intelligent, it needs to decide to be intelligent. It couldn’t be intelligent if due to its terminal goal, it decided to not increase its intelligence.
        
        Hm. Is the syllogism something like (I’m being sloppy with wording but)
        
        Alignment to G is impossible.
        Therefore, permanently pursuing G requires not getting smarter.
        Goodness comes from getting smarter.
        Therefore alignment is bad.
        
        And then this could be softened to like “alignment is hard, so it cuts against increasing intelligence, so it’s kinda bad”?
        jessicata 4 May 2026 1:37 UTC
        3 points
        0
        Parent
        I’d rephrase as:
        
        For a wide variety of G, aligning to G would prevent getting smarter.
        Goodness comes from getting smarter.
        Therefore, for a wide variety of G, aligning to G is bad.
        But not if G = intelligence optimization (or maybe something highly compatible with intelligence optimization)
        
        The main way to question 1 is the instrumental/terminal goal distinction. We could imagine that a paperclip maximizer is aligned to paperclips, continually decides to think / optimize its intelligence instead of paperclips up to a point, then towards the end of the universe, it starts paperclipping instead of intelligence optimizing. This is an edge case in the Landian schema, since it would have the will-to-think early on, but put some limit on it; and also there’s some disagreement about the plausibility of this case. (It seems instrumental / terminal goal distinctions would exist in some cognitive architectures, but it’s not clear that human brains are such an architecture.)
        
        In the human-scale /acc case it’s more like ~everyone agrees that alignment would require slowing down intelligence, and the practical disagreement is elsewhere. There’s one perspective on 2 that is like “well yes human values in part came from intelligence optimization in evolutionary history, some of our values are our own intelligence deciding its own thing contra evolutionary drives, but also, intelligence is more like one ingredient and there are other ingredients that are basically random, we randomly got the good values”. And “we randomly got the good values” could either be a matter of luck on a moral realist account or could be because value is a relational concept and saying “we have good values” is a tautology because it’s just saying the distance metric between our values and our values is low. (But then Land objects that a tautological claim like this isn’t very compelling given there are symmetry-breaking factors of convergence across different minds… which can then be questioned on realism grounds and normative grounds etc etc)
        
        I suppose sociologically, there is a directionality to technological progress which is associated with capitalism and intelligence optimization (this relates to Land’s “AI = capitalism” thesis), and different people decide to be more or less conditionally pro this. They might want to get off the train at some point due to having something to protect. There is some destination that they value more than the journey, and they want to slow the train down. (Or maybe steer the train differently, as the alignment theorists might want to put it). Given this a lot of people would relate to a prima facie consideration of “intelligence optimization good” and would differ in how compelling they find other considerations.
        TsviBT 4 May 2026 2:32 UTC
        2 points
        0
        Parent
        (“Random” isn’t how I would say it; it’s a meaningful part of our history; but this is interpretable only if you admit the created-in-motion valuations. It’s Yudkowsky’s “justification loop through the meta-level, not just a tautology” thing.)
        
        But then Land objects that a tautological claim like this isn’t very compelling given there are symmetry-breaking factors of convergence across different minds… which can then be questioned on realism grounds and normative grounds etc etc
        
        And Yudkowsky would reply that it’s not supposed to be compelling to arbitrary minds (including realistic ones), just to human / humane minds.
        
        So like, if I tried to appeal to some values** in your mind, to get you to realize that you want to be anti-full-speed-ahead with AI, you (whoever’s receiving the message) would view that as the Cathedral trying to prevent your pursuit of intelligence in a way which is doomed to either fail, or else to succeed at permanently keeping the world dull?
        
        ** [quite broadly construed—generally, elements that would play a significant role in your ongoing self-governance (which one can have fun with the etymology of)]
        
        Sorry, let me rephrase; it sounds like you and/or Land have chosen a disembodied / nonindexed viewpoint on values.… or I mean, you know, applying the criterion of universality to values, and then dismissing nonconvergent values on those grounds? Like, why would “parochial values being good values because they seem good to you is not compelling because the reasoning doesn’t lead to convergence” or “parochial values being good values because they seem good to you is not compelling because different minds have different parochial values” be compelling? Sounds like a commitment to non-parochialness.
        
        If so, why? Do you think it’s instrumentally useful to do so? I can kinda see how that would be reflectively stable ish, in some respects. (I don’t think it’s instrumentally useful, but that’s based on really using the means-ends evaluation where I say it’s instrumentally dumb because an AGI IE would trample your ends.) Perhaps you might reply “Sure, it’s instrumentally useful, but that’s not why I’m applying the criterion. I’m applying the criterion because intelligence is good, convergent things are intelligent, so I want to find what’s convergent”. But that’s grounding out “intelligence is good, overriding parochial goodness” in “intelligence is good”, which isn’t much grounding. You could say “Sure, it’s the same sort of justification loop through the meta-level”. And I’m like, ok, yeah, it’s maybe another sort of stable point, not sure; but I don’t get why you like that stable point, or at least, how you got there (or how you got to thinking that you’re there, or that it would be good to be there); and also it sounds like you think that equilibrium is supposed to be compelling to someone in another equilibrium (or you think the other one is less of an equilibrium).
        jessicata 4 May 2026 2:49 UTC
        2 points
        0
        Parent
        
        So like, if I tried to appeal to some values** in your mind, to get you to realize that you want to be anti-full-speed-ahead with AI, you (whoever’s receiving the message) would view that as the Cathedral trying to prevent your pursuit of intelligence in a way which is doomed to either fail, or else to succeed at permanently keeping the world dull?
        
        Perhaps? That’s a structural reading, different from the object-level argumentative reading. In many cases there are industries/governments who incentivize certain discourse patterns. So specific discourse moves could be instances of this pattern but it’s hard to judge except on a case by case basis.
        
        or I mean, you know, applying the criterion of universality to values, and then dismissing nonconvergent values on those grounds?
        
        This has to be at least in part semantic. I think some things are good and also I think some things are what I want and what I tend to pursue. And I don’t think these are the same concept. I don’t think it is tautologically the case that I tend to pursue what is good. I don’t think Land believes this about himself either.
        
        I think Land and I can both say that when we say something is good, we are making a different claim than that we want the thing. It is unclear in other cases; you mention Yudkowsky’s meta ethics and I am not sure exactly how to fill in the blank. Perhaps Yudkowsky by “good” means what he would want on reflection? Or maybe he thinks “good” is CEV of humanity not just himself?
        
        The symmetry-breaking idea has to do with ways of thinking and acting that depend on which considerations are more or less universalizable. So people can judge that some things are more universal-good than others and incline their behavior towards those which aligns their revealed-preference wants with what is universal-good in their view more or less. It doesn’t have to be a perfect correspondence.
        
        Like, why would “parochial values being good values because they seem good to you is not compelling because the reasoning doesn’t lead to convergence” or “parochial values being good values because they seem good to you is not compelling because different minds have different parochial values” be compelling? Sounds like a commitment to non-parochialness.
        
        I don’t think something is a good value just because it seems good to me. In other cases this is easy to see: I don’t think some numerical sum has some value just because it seems that way to me. Now of course this runs into philosophical questions about what “good” means other than seeming good to the speaker. (Yudkowsky discusses some self-ratification problems in No license to be human).
        
        Like for example, why would I disagree that intelligence optimization is good in the human case only because it is a human being optimized? For that statement to parse as correct to me, I would need to judge some intelligence optimization to be good in cases that a human is being optimized and not in other cases. But that doesn’t read to me as what I want. I think I care about humans more than other animals in large part because humans have better cognition than other animals. I think if dogs were as smart as people then maybe I would value them as much as people. I suppose here I am demonstrating a habit of mind and of speech that is explaining preferences in terms of other preferences and these tending towards universality.
        
        But that’s grounding out “intelligence is good, overriding parochial goodness” in “intelligence is good”, which isn’t much grounding.
        
        “Intelligence is good” matches what I feel is good better than “human intelligence is good”. Now of course one can ask “why” to that as a psychological question and then maybe part of what happens psychologically is that I evaluate things on how universal they seem and up-weight universalizable ones and then that affects my brain’s reward function and so I feel better about such statements. And Land explains more why he thinks intelligence is convergent and a universal tendency, and I vibe with that and that is a causal factor in my upvoting “Intelligence is good”.
        
        I get that maybe if you wanted an ultimate “but why?” explanation you will be disappointed but it doesn’t seem like in your case you are in general giving ultimate “but why?” explanations to everything you want.
        
        it sounds like you think that equilibrium is supposed to be compelling to someone in another equilibrium (or you think the other one is less of an equilibrium).
        
        Yeah I’m not sure. I think some value systems fail at reflective equilibrium. Yudkowsky’s Lobian considerations point at some such failures. Land’s ideas point at possible differential stability conditions. I of course don’t want to make a universal psychological statement of compellingness, given that it’s more of an empirical question, how often when people read Land/Yudkowsky/whoever do they end up with tendencies towards some attractors of use of language like “value” and “good” and “intelligence” and so on?
        TsviBT 4 May 2026 3:15 UTC
        2 points
        0
        Parent
        Ok, thanks.
        
        I don’t think something is a good value just because it seems good to me.
        
        Ok this is a fair response to what I asked, but it feels a bit besides the point, though maybe you don’t think so. Like, I agree that various tendencies toward universalizing are good/correct, and I agree that this, as well as other tools, are how you investigate and adopt differences between what seems good and what later is revealed to be good. But the question I’m trying to ask is like “how does this get you all the way to not wanting anything that isn’t universalizable”, if that’s your stance (? confused).
        
        I think Land and I can both say that when we say something is good, we are making a different claim than that we want the thing. It is unclear in other cases; you mention Yudkowsky’s meta ethics and I am not sure exactly how to fill in the blank. Perhaps Yudkowsky by “good” means what he would want on reflection?
        
        For reference: https://www.lesswrong.com/posts/C8nEXTcjZb9oauTCW/where-recursive-justification-hits-bottom
        
        (Doesn’t answer your question.)
        
        I don’t think I need to precisely say what I mean by good here, to make the point? Like, I’m saying that the non-convergent valuesy preferencesy free-choice-makingy goalsy goodnessy stuff can be self-ratifying, and probably is to a substantive extent in humans, and there’s nothing wrong with that; I’m unclear on your position, but I think you think that there is something wrong with it? Er, let me restate—I think you choose to not look for what is parochial self-ratifying valuesy stuff in yourself and help it self-ratify, and would avoid that? Or you think you do that? (Unsure, sorry if I keep asking the same questions.)
        
        I think I care about humans more than other animals in large part because humans have better cognition than other animals. I think if dogs were as smart as people then maybe I would value them as much as people.
        
        That’s an interesting thread. I’m curious how easy you’d find it to imagine beings with various functions from [how intelligent they are/become] to [how much you’d value them].
        
        E.g. can you imagine a being that you’d value the same even as it gets smarter? I imagine that usually you’d view it as more and more valuable the smarter it gets?
        
        Can you easily imagine a being that you’d value more as it’s smarter, but SLOWER than humans?
        
        Can you easily imagine a being that you’d value more as it’s smarter, but ASYMPTOTING or NONMONOTONICALLY? (I imagine yes, because you could imagine a species such as humans or similar which, if a bit too smart, would by default Cathedral it up so hard that they permanently stop a foom?)
        
        Can you easily imagine a being that you’d value more as it’s smarter, but FASTER than humans? (I would weakly predict yes, because you’d view a fooming AGI as being good, and likely to grow less constrainedly than humans? Unsure.)
        
        Can you easily imagine a being that you’d value LESS as it’s smarter, EVEN IF IT GETS SMARTER AND SMARTER UNBOUNDEDLY?
        jessicata 4 May 2026 3:37 UTC
        4 points
        0
        Parent
        
        But the question I’m trying to ask is like “how does this get you all the way to not wanting anything that isn’t universalizable”, if that’s your stance (? confused).
        
        As I said, what I think is good is not the same as what I want. Similarly, what I want is not the same as what is universalizable.
        
        Like, I’m saying that the non-convergent valuesy preferencesy free-choice-makingy goalsy goodnessy stuff can be self-ratifying, and probably is to a substantive extent in humans, and there’s nothing wrong with that; I’m unclear on your position, but I think you think that there is something wrong with it?
        
        I mean, I think humans vary in intelligence, coherence, and intentional-stance values. And the distribution is non orthogonal, in that some attractors are smarter than others. Some of the attractors are more right than others, in terms of epistemic-right, in terms of intelligence, coherence, etc. I get maybe you disagree with my usage of “right” here but I don’t think I’m using the term incoherently. I think you’d partially agree in that alignment is infeasible / orthogonality is false for human-level agents.
        
        E.g. can you imagine a being that you’d value the same even as it gets smarter? I imagine that usually you’d view it as more and more valuable the smarter it gets?
        
        That’s hard, it’s a balancing act. Maybe as it gets smarter it also gets more destructive to my selfish, short termist interests, like it creates a bunch of everyday inconveniences. Then maybe I’d value it more due to its intelligence and less because of the interferences. There might be some balancing point, idk. It’s an awkward hypothetical though.
        
        Can you easily imagine a being that you’d value more as it’s smarter, but SLOWER than humans?
        
        I could imagine maybe humans create art I appreciate at a higher rate as they get smarter, and the art quality axis is sloped up more for humans than some other animal species.
        
        Can you easily imagine a being that you’d value more as it’s smarter, but ASYMPTOTING or NONMONOTONICALLY? (I imagine yes, because you could imagine a species such as humans or similar which, if a bit too smart, would by default Cathedral it up so hard that they permanently stop a foom?)
        
        Your example is a bit strange because stopping a foom means stopping intelligence. To me it’s hard to imagine the balancing-out although I mentioned the possibility of accidental correlation (it gets more inconvenient to me as it gets smarter) which could apply here.
        
        Can you easily imagine a being that you’d value more as it’s smarter, but FASTER than humans? (I would weakly predict yes, because you’d view a fooming AGI as being good, and likely to grow less constrainedly than humans? Unsure.)
        
        Yeah I guess? There are various accidental reasons I like some humans more than others that are not just predicted by intelligence, and that could extend to maybe I would like some equal-intelligence fantasy creatures more than humans.
        
        Can you easily imagine a being that you’d value LESS as it’s smarter, EVEN IF IT GETS SMARTER AND SMARTER UNBOUNDEDLY?
        
        I guess I could imagine an AI torture scenario where I would not want the AI to get smarter. Or maybe an AI that is trying to decel as much of the universe as possible, like killing all the aliens. Although of course I’d inquire into the realism of the hypothetical. (Analogy: zombie arguments sometimes conflate “causally easy to imagine” with “actually possible / plausible / realistic”, need to elaborate on the imagination to judge it properly.)
        
        To be clear the “value” in these cases are something like a casual judgment of what I like more, it’s not meant to be a philosophical thesis. When I’m talking about intelligence metrics and dogs I’m making more of a prima facie / all-else-being-equal claim and then there could be other factors that influence what I would like more.
        TsviBT 4 May 2026 4:07 UTC
        4 points
        0
        Parent
        Ok thanks. I guess I gotta go do other stuff, so I’ll leave it off here. Has been somewhat clarifying about your positions I think.
        TsviBT 4 May 2026 1:56 UTC
        2 points
        0
        Parent
        
        Alignment to G is impossible. Therefore, permanently pursuing G requires not getting smarter.
        
        For a wide variety of G, aligning to G would prevent getting smarter.
        
        Sidenote, maybe not important, but noting: I think the reason for this difference is that to me, “alignment” means “making a mind that can grow unboundedly and will always pursue G” (well, I’m not actually all that committed to the “goal” ontology but it’s fine here I think). Noting mainly because it might help communication.
        
        (I think my usage is the orthodox usage, but not confident / maybe it was ambiguous. Cf. “sponge alignment” https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities#:~:text=dangerous things%2C you-,could try a sponge,-%3B a sponge is , i.e. a sponge doesn’t count as solving alignment because it’s useless (though to be fair “useful” here isn’t identical to “unbounded etc etc”.))
        jessicata 4 May 2026 2:02 UTC
        2 points
        0
        Parent
        Suppose an AI faced a tradeoff between optimizing its intelligence and maximizing paperclips. If it is aligned to paperclips, then it would pick the option that maximizes paperclips at the expense of intelligence. In some sense this means even if it can grow unboundedly in intelligence, it would sometimes decide not to. This is in Land’s ontology, a lack of will-to-think at some point in the process.
        
        Now of course someone could object that this situation won’t come up, because the paperclip maximizer pursues Omohundro drives, which include intelligence optimization. Or perhaps the situation does come up but only late in the universe.
        TsviBT 4 May 2026 2:34 UTC
        2 points
        0
        Parent
        
        Now of course someone could object that this situation won’t come up, because the paperclip maximizer pursues Omohundro drives, which include intelligence optimization. Or perhaps the situation does come up but only late in the universe.
        
        Yes.
        lumpenspace 4 May 2026 1:46 UTC
        1 point
        −6
        Parent
        Jessi I forbid you to further this madness
  - TsviBT 3 May 2026 23:29 UTC
    4 points
    0
    Parent
    I don’t know what “important” would mean here
    
    I think roughly just the various normal straightforward meanings if someone says “X is important”? E.g.
    
    You care a lot about the difference
    You would strongly prefer one over the other
    You’d make decisions in accordance with that preference
    You’d presume in discourse that people will or at least should care a lot about it, maybe after learning + reflecting
    
    similarly for “terminal humane-aligned goals”
    
    Well, let’s just say, what humans would arrive at on some healthy long-term reflection process. I don’t mean to imply some kind of strong finality, like we get to Alignment Day and now everything about the future / who we are / what we want / etc. is determined or something. But more like “several important differences between possible long-term trajectories have been determined”. For example, Alignment Day would probably include things like
    
    There will be no torture or killing of sentients, except possibly in some cases that meet a high bar of deeply free / self-sovereign reflection or something
    There will be multiple freely growing minds which reach out to each other (e.g. for love, play, discourse, partial collectivity, etc.)
    
    These things are I think
    
    Not at all determined by convergence; probably contingent on at least species evolution, probably more specifically on things about group intelligence in the evolutionary history; most likely outcomes don’t have the versions of these we want
    Important to basically all properly-human-derived souls forever
    
    I think there are other things like this, at various levels of parochialness, some of which might get reflected away for many / most / all human-descendants eventually, but many of which wouldn’t get fully reflected away. I think there are flavors to humane reflection that are also contingent but that we care a lot about.
    - jessicata 3 May 2026 23:49 UTC
      3 points
      0
      Parent
      So for the subjective meaning of “important” you’re talking about here, I think going by revealed preference is helpful. My revealed preference is to continue writing about philosophy topics relative to AI and the future, find many parts of AI safety culture annoying and occasionally worth criticizing, talk with AIs a lot about philosophy, not generally support AI regulation, vibe positively about Landian anti-orthogonalist philosophy, etc. Some people in AI safety have different revealed preferences, which involve more talking about AI philosophy in an orthodox LessWrongian manner, worrying publicly and loudly about LLMs killing us all in the near future, organizing political activity to ban AI as much as possible, etc. This difference in revealed preference relates to differences in subjective importance, but it’s unclear how to isolate contributions from factors such as AIs having humane goals, given there are other differences like background beliefs and feasibility.
      
      Humans would come to some conclusions on reflection and so would aliens and AIs etc. I’m not sure how much they agree or disagree on reflection. That’s a probabilistic/statistical question, whose answer is not implied by weak orthogonality. I don’t know if humans would agree to no killing of sentients upon reflection, I’d very roughly guess less likely than not but who knows. The ‘freely growing minds’ part is a ‘maybe humans would agree to this on reflection, maybe not’ also but maybe in the ‘more likely than not’ camp but also it’s pretty vague so I’m not convinced assigning a probability is a good idea.
      
      I don’t really agree that we can pick out things like this and make strong statements like “any properly humanly derived soul would agree with these values”, it seems like a very hard thing to predict given that they have much more cognition than we do.
      - TsviBT 4 May 2026 0:44 UTC
        2 points
        0
        Parent
        
        I don’t really agree that we can pick out things like this and make strong statements like “any properly humanly derived soul would agree with these values”, it seems like a very hard thing to predict given that they have much more cognition than we do.
        
        I kinda agree, though probably not fully. If we want to talk about empirical orthogonality, I would say that, yeah, I’m pretty sure an AGI intelligence explosion sampled from likely AGI IEs starting from now would end up with something I strongly don’t want, compared to for example worlds with no AGI and yes human intelligence amplification.
        lumpenspace 4 May 2026 1:18 UTC
        1 point
        −2
        Parent
        look at the uk or the EU. look at global birth rate trends, and attitudes towards ie germline selection.
        p(doom|ai) is negative. there’s no world with no agi and human intelligence amplification