TekhneMakre comments on Self-Reference Breaks the Orthogonality Thesis

TekhneMakre 17 Feb 2023 7:37 UTC
3 points
0
I’m just making a terminological point. The terminological point seems important because the Orthogonality Thesis (in Yudkowsky’s sense) is actually denied by some people, and that’s a blocker for them understanding AI risk.

On your post: I think something’s gone wrong when you’re taking the world modeling and “the values” as separate agents in conflict. It’s a sort of homunculus argument https://en.wikipedia.org/wiki/Homunculus_argument w.r.t. agency. I think the post raises interesting questions though.
- lsusr 17 Feb 2023 7:51 UTC
  4 points
  0
  Parent
  If, on my first Internet search, I had found Yudkowsky defining the “Orthogonality Thesis”, then I probably would have used that definition instead. But I didn’t, so here we are.
  
  Maybe a less homunculusy way to explain what I’m getting at is that an embedded world-optimizer must optimize simultaneously toward two distinct objectives: toward a correct world model and toward an optimized world. This applies a constraint to the Orthogonality Thesis, because the world model is embedded in the world itself.
  - TekhneMakre 17 Feb 2023 19:09 UTC
    5 points
    0
    Parent
    But you can just have the world model as an instrumental subgoal. If you want to do difficult thing Z, then you want to have a better model of the parts of Z, and the things that have causal input to Z, and so on. This motivates having a better world model. You don’t need a separate goal, unless you’re calling all subgoals “separate goals”.
    
    Obviously this doesn’t work as stated because you have to have a world model to start with, which can support the implication that “if I learn about Z and its parts, then I can do Z better”.