TekhneMakre comments on Self-Reference Breaks the Orthogonality Thesis

TekhneMakre 17 Feb 2023 7:06 UTC
10 points
2
The Orthogonality Thesis is usually defined as follows: “the idea that the final goals and intelligence levels of artificial agents are independent of each other”. More careful people say “mostly independent” instead.
By whom? That’s not the definition given here: https://arbital.com/p/orthogonality/
Quoting:
The Orthogonality Thesis asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal.
The strong form of the Orthogonality Thesis says that there’s no extra difficulty or complication in creating an intelligent agent to pursue a goal, above and beyond the computational tractability of that goal.
- lsusr 17 Feb 2023 7:24 UTC
  4 points
  2
  Parent
  I started with this one from LW’s Orthogonality Thesis tag.
  
  The Orthogonality Thesis states that an agent can have any combination of intelligence level and final goal, that is, its Utility Functions(127) and General Intelligence(92) can vary independently of each other. This is in contrast to the belief that, because of their intelligence, AIs will all converge to a common goal.
  
  But it felt off to me so I switched to Stuart Armstrong’s paraphrase of Nick Bostrom’s formalization in “The Superintelligent Will”.
  
  How does the definition I use differ in substance from Arbital’s? It seems to make no difference to my argument that the cyclic references implicit to embedded agency impose a constraint on the kinds of goals arbitrarily intelligent agents may pursue.
  
  One could argue that Arbital’s definition already accounts for my exception because self-reference causes computational intractability.
  - Erich_Grunewald 17 Feb 2023 9:07 UTC
    3 points
    0
    Parent
    What seems off to me about your definition is that it says goals and intelligence are independent, whereas the Orthogonality Thesis only says that they can in principle be independent, a much weaker claim.
    - lsusr 20 Feb 2023 0:14 UTC
      2 points
      0
      Parent
      What’s your source for this definition?
      - Erich_Grunewald 20 Feb 2023 9:30 UTC
        2 points
        0
        Parent
        See for example Bostrom’s original paper (pdf):
        The Orthogonality Thesis Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.
        It makes no claim about how likely intelligence and final goals are to diverge, it only claims that it’s in principle possible to combine any intelligence with any set of goals. Later on in the paper he discusses ways of actually predicting the behavior of a superintelligence, but that’s beyond the scope of the Thesis.
  - TekhneMakre 17 Feb 2023 7:37 UTC
    3 points
    0
    Parent
    I’m just making a terminological point. The terminological point seems important because the Orthogonality Thesis (in Yudkowsky’s sense) is actually denied by some people, and that’s a blocker for them understanding AI risk.
    
    On your post: I think something’s gone wrong when you’re taking the world modeling and “the values” as separate agents in conflict. It’s a sort of homunculus argument https://en.wikipedia.org/wiki/Homunculus_argument w.r.t. agency. I think the post raises interesting questions though.
    - lsusr 17 Feb 2023 7:51 UTC
      4 points
      0
      Parent
      If, on my first Internet search, I had found Yudkowsky defining the “Orthogonality Thesis”, then I probably would have used that definition instead. But I didn’t, so here we are.
      
      Maybe a less homunculusy way to explain what I’m getting at is that an embedded world-optimizer must optimize simultaneously toward two distinct objectives: toward a correct world model and toward an optimized world. This applies a constraint to the Orthogonality Thesis, because the world model is embedded in the world itself.
      - TekhneMakre 17 Feb 2023 19:09 UTC
        5 points
        0
        Parent
        But you can just have the world model as an instrumental subgoal. If you want to do difficult thing Z, then you want to have a better model of the parts of Z, and the things that have causal input to Z, and so on. This motivates having a better world model. You don’t need a separate goal, unless you’re calling all subgoals “separate goals”.
        
        Obviously this doesn’t work as stated because you have to have a world model to start with, which can support the implication that “if I learn about Z and its parts, then I can do Z better”.