human intelligence may be alignment-limited

Previously, I argued that human mental development implies that AI self-improvement from sub-human capabilities is possible, and that human intelligence comes at the cost of a longer childhood and greater divergence from evolutionarily-specified goals.

In that post, I raised 2 hypotheses:

H:mutational_load = Human capabilities are limited by high intelligence requiring high genetic precision that can be achieved only rarely with normal rates of mutation generation and elimination.
H:drift_bound = The extent of SI in humans is limited by increased value drift outweighing increased capabilities.

Humans have a lot of mental variation. Some people can’t visualize 3d objects. Some people can’t remember faces. Some people have synaesthesia. Such variation also exists among very smart people; there isn’t convergence to a single intellectual archetype. You could argue that what’s needed genetically is precise specification of something lower-level that underlies all that variation, but I don’t think that’s correct.

So, I don’t think H:mutational_load is right. That leaves H:drift_bound as the only hypothesis that seems plausible to me.

Suppose that I’m correct that human intelligence comes at the cost of a longer childhood. The disadvantages of a long childhood vary depending on social circumstances. Humans may have some control mechanism which modifies the amount of mental self-improvement and thus the length of childhood depending on the surrounding environment. Certain environments—probably safe ones with ample food—would then be associated with both longer childhoods and a one-time increase in average intelligence. That would also cause greater divergence from evolutionarily-specified goals, which may show up as a decrease in fertility rates, or an increased rate of obsession with hobbies. That can obviously be pattern-matched to the situation in some countries today, but I don’t mean to say that it’s definitely true; I just want to raise it as a hypothesis.

If H:drift_bound is correct, it would be an example of an optimized system having a strong and adjustable tradeoff between capabilities and alignment, which would be evidence for AI systems also tending to have such a tradeoff.

Agents are adaptation-executors with adaptations that accomplish goals, not goal-maximizers. Understanding agents as maximizing goals is a simplification used by humans to make them easier to understand. This is as true when the goal is self-improvement as it is with anything else.

“Creation of a more-intelligent agent” involves actions that are different at each step. I consider it an open question whether intelligent systems applying recursive self-improvement tend to remain oriented towards creating more-intelligent agents more than they remain oriented towards non-instrumental specified goals. My view is that one of the following is true:

Instrumental convergence is correct, and can maintain creation of more-intelligent agents as a goal during recursive self-improvement despite the actions/adaptations involved being very different.
Self-improvement has a fixed depth set by the initial design, rather than unlimited potential depth. This may limit AI to approximately human-level intelligence because drift would be a similarly limiting factor for both humans and AI, but it does seem that many humans have self-improvement as a goal, and some humans have creation of a more-intelligent but different self or even a more-intelligent completely separate agent as a goal.