A compressed take on recent disagreements

An underlying generator of many recent disagreements appears to be differing locations of the disagreeing individuals on the Yudkowsky-Hanson spectrum. That is, differences in individuals’ distributions over the background variable “how does an optimizer’s ability to optimize scale as you apply (meta) optimization to it”. I’m going to simplify my discussion by boiling this variable down to an “expected steepness” (as opposed to a distribution over optimization-in vs. optimization-out curves).

Eliezer believes this relationship to be steeper than do many others in the alignment sphere. For instance, many (~half by my count) of the disagreements listed in Paul’s recent post appear to pretty directly imply that Paul believes this relationship is much less steep than does Eliezer. It seems likely to me that this difference is the primary generator of those disagreements.

Eliezer has previously suggested formalizing this “optimization-in vs. optimization-out” relationship in Intelligence Explosion Microeconomics. Clearly this is not such an easy thing, or he probably would have just done it himself. Nonetheless this may be a pathway towards resolving some of these disagreements.

So, how do optimizers scale with applied optimization?

I’ll quickly give my two cents on the matter by noting that we live in a world where the best mathematicians are, quite literally, something like 1000 times as productive as the average mathematicians (keeping in mind this gap only spans about a quarter of the range of variation of mathematical ability in the modern human population; about +3 to +6 stdevs).

This huge difference in observed performance occurs despite the minds in question having about the same amount of hardware (brain size) built out of components running at about the same speed (neurons) executing the same basic algorithm (genetic differences are tiny compared to size of genome). The difference appears to be largely a result of small algorithmic tweaks.

This leads me to the conclusion that, if you take a ~human level AGI and optimize it a little bit further, you probably end up with something pretty strongly superhuman. It also leads me to conclude that it’s hard to build a ~human level AGI in the first place because it’s a small target on the capability axis: you’ll likely just accidentally blow past it, and even if you can hit it, someone else will just blow past it soon thereafter.