But though mathematical reasoning can sometimes go astray, when it works at all, it works because, in fact, even bounded creatures can sometimes manage to obey local relations that in turn add up to a global coherence where all the pieces of reasoning point in the same direction, like photons in a laser lasing, even though there’s no internal mechanism that enforces the global coherence at every point.
To the extent that the outer optimizer trains you out of paying five apples on Monday for something that you trade for two oranges on Tuesday and then trading two oranges for four apples, the outer optimizer is training all the little pieces of yourself to be locally coherent in a way that can be seen as an imperfect bounded shadow of a higher unbounded structure, and then the system is powerful though imperfect because of how the power is present in the coherence and the overlap of the pieces, because of how the higher perfect structure is being imperfectly shadowed. In this case the higher structure I’m talking about is Utility, and doing homework with coherence theorems leads you to appreciate that we only know about one higher structure for this class of problems that has a dozen mathematical spotlights pointing at it saying “look here”, even though people have occasionally looked for alternatives.
Having plans that lase is (1) a thing you can generalize on, i.e. get good at because different instances have a lot in common, and (2) a thing that is probably heavily rewarded in general (by the reward thingy, or by internal credit assignment / economies), to the extent that the reward systems have correct credit assignment. So an AI that does impressive stuff probably has a general skill + dynamic of increasing coherence.
Not sure how to clarify. AI capabilities research consists of looking for computer programs that do lots of different stuff. So you’re selecting for computer programs that do lots of different stuff. The claim is that that selection heavily upvotes algorithms that tend towards coherence-in-general.
Because being able to do impressive stuff means you had some degree of coherence. From https://www.alignmentforum.org/posts/7im8at9PmhbT4JHsW/ngo-and-yudkowsky-on-alignment-difficulty :
Having plans that lase is (1) a thing you can generalize on, i.e. get good at because different instances have a lot in common, and (2) a thing that is probably heavily rewarded in general (by the reward thingy, or by internal credit assignment / economies), to the extent that the reward systems have correct credit assignment. So an AI that does impressive stuff probably has a general skill + dynamic of increasing coherence.
I did not understand anything from what you said… How does coherence generate an equivalent of an internal “push” to do something?
Not sure how to clarify. AI capabilities research consists of looking for computer programs that do lots of different stuff. So you’re selecting for computer programs that do lots of different stuff. The claim is that that selection heavily upvotes algorithms that tend towards coherence-in-general.