Noosphere89 comments on What Is The Alignment Problem?

Noosphere89 20 Jan 2025 1:32 UTC
2 points
0

Many accounts of cognition are impossible (eg AIXI, VNM rationality, or anything utilizing utility functions, many AIT concepts), since they include the impossible step of considering all possible worlds. I think people normally consider this to be something like a “God’s eye view” of intelligence—ultimately correct, but incomputable—which can be projected down to us bounded creatures via approximation, but I think this is the wrong sort of in-principle to real-world bridge. Like, it seems to me that intelligence is fundamentally about ~“finding and exploiting abstractions,” which is something that having limited resources forces you to do. I.e., intelligence comes from the boundedness. Such that the emphasis should imo go the other way: figuring out the core of what this process of “finding and exploiting abstractions” is, and then generalizing outward. This feels related to behaviorism insomuch as behaviorist accounts often rely on concepts like “searching over the space of all programs to find the shortest possible one.”

I do think a large source of impossibility results come from trying to consider all possible worlds, but the core feature of all of the impossible proposals in our reality is a combination of ignoring computational difficulty entirely, combined with problems on embedded agency, and that the boundary between agent and environment is fundamental to most descriptions of intelligence/agency, ala Cartesian boundaries, but physically universal cellular automatons invalidate this abstraction, meaning the boundary is arbitrary and has no meaning at a low level, and our universe is plausibly physically universal.

More here:

https://www.lesswrong.com/posts/dHNKtQ3vTBxTfTPxu/what-is-the-alignment-problem#3GvsEtCaoYGrPjR2M

(Caveat that the utility function framing actually can work, assuming we restrain the function classes significantly enough, and you could argue the GPT series has a utility function of prediction, but I won’t get into that).

The problem with the bridge is that even if the god’s eye view of intelligence theories was totally philosophically correct, there is no way to get anything like that, and thus you cannot easily approximate without giving very vacuous bounds, and thus you need a specialized theory of intelligence in specific universes that might be inelegant philosophically/mathematically, but that can actually work to build and align AGI/ASI, especially if it comes soon.