Max H comments on A rejection of the Orthogonality Thesis

Max H 24 May 2023 21:40 UTC
6 points
6
1. There is a large mind design space. Do we have any actual reasons for thinking so? Sure, one can argue everything has a large design space, but in practice, there’s often an underlying unique mechanism for how things work.
I don’t see how this relates to the Orthogonality Thesis. For a given value or goal, there may be many different cognitive mechanisms for figuring out how to accomplish it, or there may be few, or there may be only one unique mechanism. Different cognitive mechanisms (if they exist) might lead to the same or different conclusions about how to accomplish a particular goal.
For some goals, such as re-arranging all atoms in the universe in a particular pattern, it may be that there is only one effective way of accomplishing such a goal, so whether different cognitive mechanisms are able to find the strategy for accomplishing such a goal is mainly a question of how effective those cognitive mechanisms are. The Orthogonality Thesis is saying, in part, that figuring out how to do something is independent of wanting to do something, and that the space of possible goals and values is large. If I were smarter, I probably could figure out how to tile the universe with tiny squiggles, but I don’t want to do that, so I wouldn’t.
2. Ethics are not an emergent property of intelligence—but again, that’s just an assertion. There’s no reason to believe or disbelieve it. It’s possible that self-reflection (and hence ethics and the ability to question one’s goals and motivations) is a pre-requisite for general cognition—we don’t know whether this is true or not because we don’t really understand intelligence yet.
I don’t see what ability to self-reflect has to do with ethics. It’s probably true that anything superintelligent is capable, in some sense, of self-reflection, but why would that be a problem for the Orthogonality Thesis? Do you believe that an agent which terminally values tiny molecular squiggles would “question its goals and motivations” and conclude that creating squiggles is somehow “unethical”? If so, maybe review the metaethics sequence; you may be confused about what we mean around here when we talk about ethics, morality, and human values.
The previous two are assertions that could be true, but reflective stability is definitely not true—it’s paradoxical.
I think reflective stability, as it is usually used on LW, means something more narrow than how you’re interpreting it, and is not paradoxical. It’s usually used to describe a property of an agent following a particular decision theory. For example, a causal decision theory agent is not reflectively stable, because on reflection, it will regret not having pre-committed in certain situations. Logical decision theories are more reflectively stable in the sense that their adherents do not need to pre-commit to anything, and will therefore not regret not making any pre-commitments when reflecting on their own minds and decision processes, and how they would behave in hypothetical or future situations.
- ArisC 25 May 2023 7:01 UTC
  1 point
  0
  Parent
  I don’t see how this relates to the Orthogonality Thesis.
  It relates to it because it’s an explicit component of it, no? The point being that if there is only one way of general cognition to work, perhaps that way by default involves self-reflection, which brings us to the second point...
  Do you believe that an agent which terminally values tiny molecular squiggles would “question its goals and motivations” and conclude that creating squiggles is somehow “unethical”?
  Yes, that’s what I’m suggesting; not saying it’s definitely true; but it’s not obviously wrong, either. Haven’t read the sequence, but I’ll try to find the time to do so—but basically I question the wording ‘terminally values’. I think that perhaps general intelligence tends to avoid valuing anything terminally (what do we humans value terminally?)
  I think reflective stability, as it is usually used on LW, means something more narrow than how you’re interpreting it
  Possibly, but I’m responding to its definition in the OT post I linked to, in which it’s used to mean that agents will avoid making changes that may affect their dedication to their goals.