jessicata comments on No Strong Orthogonality From Selection Pressure

jessicata 4 May 2026 2:49 UTC
2 points
0

So like, if I tried to appeal to some values** in your mind, to get you to realize that you want to be anti-full-speed-ahead with AI, you (whoever’s receiving the message) would view that as the Cathedral trying to prevent your pursuit of intelligence in a way which is doomed to either fail, or else to succeed at permanently keeping the world dull?

Perhaps? That’s a structural reading, different from the object-level argumentative reading. In many cases there are industries/governments who incentivize certain discourse patterns. So specific discourse moves could be instances of this pattern but it’s hard to judge except on a case by case basis.

or I mean, you know, applying the criterion of universality to values, and then dismissing nonconvergent values on those grounds?

This has to be at least in part semantic. I think some things are good and also I think some things are what I want and what I tend to pursue. And I don’t think these are the same concept. I don’t think it is tautologically the case that I tend to pursue what is good. I don’t think Land believes this about himself either.

I think Land and I can both say that when we say something is good, we are making a different claim than that we want the thing. It is unclear in other cases; you mention Yudkowsky’s meta ethics and I am not sure exactly how to fill in the blank. Perhaps Yudkowsky by “good” means what he would want on reflection? Or maybe he thinks “good” is CEV of humanity not just himself?

The symmetry-breaking idea has to do with ways of thinking and acting that depend on which considerations are more or less universalizable. So people can judge that some things are more universal-good than others and incline their behavior towards those which aligns their revealed-preference wants with what is universal-good in their view more or less. It doesn’t have to be a perfect correspondence.

Like, why would “parochial values being good values because they seem good to you is not compelling because the reasoning doesn’t lead to convergence” or “parochial values being good values because they seem good to you is not compelling because different minds have different parochial values” be compelling? Sounds like a commitment to non-parochialness.

I don’t think something is a good value just because it seems good to me. In other cases this is easy to see: I don’t think some numerical sum has some value just because it seems that way to me. Now of course this runs into philosophical questions about what “good” means other than seeming good to the speaker. (Yudkowsky discusses some self-ratification problems in No license to be human).

Like for example, why would I disagree that intelligence optimization is good in the human case only because it is a human being optimized? For that statement to parse as correct to me, I would need to judge some intelligence optimization to be good in cases that a human is being optimized and not in other cases. But that doesn’t read to me as what I want. I think I care about humans more than other animals in large part because humans have better cognition than other animals. I think if dogs were as smart as people then maybe I would value them as much as people. I suppose here I am demonstrating a habit of mind and of speech that is explaining preferences in terms of other preferences and these tending towards universality.

But that’s grounding out “intelligence is good, overriding parochial goodness” in “intelligence is good”, which isn’t much grounding.

“Intelligence is good” matches what I feel is good better than “human intelligence is good”. Now of course one can ask “why” to that as a psychological question and then maybe part of what happens psychologically is that I evaluate things on how universal they seem and up-weight universalizable ones and then that affects my brain’s reward function and so I feel better about such statements. And Land explains more why he thinks intelligence is convergent and a universal tendency, and I vibe with that and that is a causal factor in my upvoting “Intelligence is good”.

I get that maybe if you wanted an ultimate “but why?” explanation you will be disappointed but it doesn’t seem like in your case you are in general giving ultimate “but why?” explanations to everything you want.

it sounds like you think that equilibrium is supposed to be compelling to someone in another equilibrium (or you think the other one is less of an equilibrium).

Yeah I’m not sure. I think some value systems fail at reflective equilibrium. Yudkowsky’s Lobian considerations point at some such failures. Land’s ideas point at possible differential stability conditions. I of course don’t want to make a universal psychological statement of compellingness, given that it’s more of an empirical question, how often when people read Land/Yudkowsky/whoever do they end up with tendencies towards some attractors of use of language like “value” and “good” and “intelligence” and so on?
- TsviBT 4 May 2026 3:15 UTC
  2 points
  0
  Parent
  Ok, thanks.
  
  I don’t think something is a good value just because it seems good to me.
  
  Ok this is a fair response to what I asked, but it feels a bit besides the point, though maybe you don’t think so. Like, I agree that various tendencies toward universalizing are good/correct, and I agree that this, as well as other tools, are how you investigate and adopt differences between what seems good and what later is revealed to be good. But the question I’m trying to ask is like “how does this get you all the way to not wanting anything that isn’t universalizable”, if that’s your stance (? confused).
  
  I think Land and I can both say that when we say something is good, we are making a different claim than that we want the thing. It is unclear in other cases; you mention Yudkowsky’s meta ethics and I am not sure exactly how to fill in the blank. Perhaps Yudkowsky by “good” means what he would want on reflection?
  
  For reference: https://www.lesswrong.com/posts/C8nEXTcjZb9oauTCW/where-recursive-justification-hits-bottom
  
  (Doesn’t answer your question.)
  
  I don’t think I need to precisely say what I mean by good here, to make the point? Like, I’m saying that the non-convergent valuesy preferencesy free-choice-makingy goalsy goodnessy stuff can be self-ratifying, and probably is to a substantive extent in humans, and there’s nothing wrong with that; I’m unclear on your position, but I think you think that there is something wrong with it? Er, let me restate—I think you choose to not look for what is parochial self-ratifying valuesy stuff in yourself and help it self-ratify, and would avoid that? Or you think you do that? (Unsure, sorry if I keep asking the same questions.)
  
  I think I care about humans more than other animals in large part because humans have better cognition than other animals. I think if dogs were as smart as people then maybe I would value them as much as people.
  
  That’s an interesting thread. I’m curious how easy you’d find it to imagine beings with various functions from [how intelligent they are/become] to [how much you’d value them].
  
  E.g. can you imagine a being that you’d value the same even as it gets smarter? I imagine that usually you’d view it as more and more valuable the smarter it gets?
  
  Can you easily imagine a being that you’d value more as it’s smarter, but SLOWER than humans?
  
  Can you easily imagine a being that you’d value more as it’s smarter, but ASYMPTOTING or NONMONOTONICALLY? (I imagine yes, because you could imagine a species such as humans or similar which, if a bit too smart, would by default Cathedral it up so hard that they permanently stop a foom?)
  
  Can you easily imagine a being that you’d value more as it’s smarter, but FASTER than humans? (I would weakly predict yes, because you’d view a fooming AGI as being good, and likely to grow less constrainedly than humans? Unsure.)
  
  Can you easily imagine a being that you’d value LESS as it’s smarter, EVEN IF IT GETS SMARTER AND SMARTER UNBOUNDEDLY?
  - jessicata 4 May 2026 3:37 UTC
    4 points
    0
    Parent
    
    But the question I’m trying to ask is like “how does this get you all the way to not wanting anything that isn’t universalizable”, if that’s your stance (? confused).
    
    As I said, what I think is good is not the same as what I want. Similarly, what I want is not the same as what is universalizable.
    
    Like, I’m saying that the non-convergent valuesy preferencesy free-choice-makingy goalsy goodnessy stuff can be self-ratifying, and probably is to a substantive extent in humans, and there’s nothing wrong with that; I’m unclear on your position, but I think you think that there is something wrong with it?
    
    I mean, I think humans vary in intelligence, coherence, and intentional-stance values. And the distribution is non orthogonal, in that some attractors are smarter than others. Some of the attractors are more right than others, in terms of epistemic-right, in terms of intelligence, coherence, etc. I get maybe you disagree with my usage of “right” here but I don’t think I’m using the term incoherently. I think you’d partially agree in that alignment is infeasible / orthogonality is false for human-level agents.
    
    E.g. can you imagine a being that you’d value the same even as it gets smarter? I imagine that usually you’d view it as more and more valuable the smarter it gets?
    
    That’s hard, it’s a balancing act. Maybe as it gets smarter it also gets more destructive to my selfish, short termist interests, like it creates a bunch of everyday inconveniences. Then maybe I’d value it more due to its intelligence and less because of the interferences. There might be some balancing point, idk. It’s an awkward hypothetical though.
    
    Can you easily imagine a being that you’d value more as it’s smarter, but SLOWER than humans?
    
    I could imagine maybe humans create art I appreciate at a higher rate as they get smarter, and the art quality axis is sloped up more for humans than some other animal species.
    
    Can you easily imagine a being that you’d value more as it’s smarter, but ASYMPTOTING or NONMONOTONICALLY? (I imagine yes, because you could imagine a species such as humans or similar which, if a bit too smart, would by default Cathedral it up so hard that they permanently stop a foom?)
    
    Your example is a bit strange because stopping a foom means stopping intelligence. To me it’s hard to imagine the balancing-out although I mentioned the possibility of accidental correlation (it gets more inconvenient to me as it gets smarter) which could apply here.
    
    Can you easily imagine a being that you’d value more as it’s smarter, but FASTER than humans? (I would weakly predict yes, because you’d view a fooming AGI as being good, and likely to grow less constrainedly than humans? Unsure.)
    
    Yeah I guess? There are various accidental reasons I like some humans more than others that are not just predicted by intelligence, and that could extend to maybe I would like some equal-intelligence fantasy creatures more than humans.
    
    Can you easily imagine a being that you’d value LESS as it’s smarter, EVEN IF IT GETS SMARTER AND SMARTER UNBOUNDEDLY?
    
    I guess I could imagine an AI torture scenario where I would not want the AI to get smarter. Or maybe an AI that is trying to decel as much of the universe as possible, like killing all the aliens. Although of course I’d inquire into the realism of the hypothetical. (Analogy: zombie arguments sometimes conflate “causally easy to imagine” with “actually possible / plausible / realistic”, need to elaborate on the imagination to judge it properly.)
    
    To be clear the “value” in these cases are something like a casual judgment of what I like more, it’s not meant to be a philosophical thesis. When I’m talking about intelligence metrics and dogs I’m making more of a prima facie / all-else-being-equal claim and then there could be other factors that influence what I would like more.
    - TsviBT 4 May 2026 4:07 UTC
      4 points
      0
      Parent
      Ok thanks. I guess I gotta go do other stuff, so I’ll leave it off here. Has been somewhat clarifying about your positions I think.