Donald Hobson comments on Self-Reference Breaks the Orthogonality Thesis

Donald Hobson 17 Feb 2023 13:06 UTC
3 points
0
I think this is deeply confused. In particular, you are confusing search and intelligence. Intelligence can be made by attaching a search component, a utility function and a world model. The world model is actually an integral, but it can be approximated by a search by searching for several good hypothesis instead of integrating over all hypothesis.
In this approximation, the world model is searching for hypothesis that fit the current data.
To deceive the search function part of the AI, the “world model” must contain a world model section that actually models the world so it can make good decisions, and an action chooser that compares various nonsensical world models according to how they make the search function and utility function break. In other words, to get this failure mode, you need fractal AI, an AI built by gluing 2 smaller AI’s together, each of which is in turn made of 2 smaller AI’s and so on ad infinitum.
Some of this discssion may point to an ad hoc hack evolution used in humans. Though most of it sounds so ad hoc even evolution would bawk. None is sane AI design. Your “search function” is there to be outwitted by the world model, with the world model inventing insane and contrived imaginary worlds in order to trick the search function into doing what the world model wants. Ie the search function would want to turn left if it had a sane picture of the world because it’s a paperclip maximizer and all the paperclips are to the left. The world model wants to turn right for less/more sensory stimuli. So the world model gaslights the search function, imagining up a hoard of zombies to the left. (While internally keeping track of the lack of zombies.) Thus scaring the search function into going right. At the very least, this design wastes compute imagining zombies.
- lsusr 19 Feb 2023 23:31 UTC
  2 points
  0
  Parent
  
  The world model is actually an integral, but it can be approximated by a search by searching for several good hypothesis instead of integrating over all hypothesis.
  
  Can you tell me what you mean by this statement? When you say “integral” I think “mathematical integral (inverse of derivative)” but I don’t think that’s what you intend to communicate.
  - Donald Hobson 20 Feb 2023 11:20 UTC
    2 points
    0
    Parent
    Yes integral is exactly what I intended to communicate.
    Think of hypothesis space. A vast abstract space of all possibilities. Each hypothesis has a $P (x)$ the probability of being true, and a $U_{a} (x)$ the utility of action $a$ if it is true.
    To really evaluate an action, you need to calculate $\int P (x) U_{a} (x) d x$ an integral over all hypothesis.
    If you don’t want to behave with maximum intelligence, just pretty good intelligence, then you can run gradient descent to find a point X by trying to maximize $P (x)$ . Then you can calculate $U_{a} (X)$ to compare actions. More sophisticated methods would sum several points.
    This is partly using the known structure of the problem. If you have good evidence, then the function $P (x)$ is basically 0 almost everywhere. So if $U_{a} (X)$ is changing fairly slowly over the region that is significantly nonzero, looking at any nonzero point of $P (x)$ is a good estimate of the integral.