Without this evolutionary analogy, why should we even elevate the very specific claim that ‘AIs will experience a sudden burst of generality at the same time as all our alignment techniques fail.’ to consideration at all, much less put significant weight on it?
I see this as a very natural hypothesis because alignment requires a hopeful answer to the question:
“Given that an agent has ontology / beliefs X and action space Y, will it do things that we want it to do?”
If the agent experiences a sudden burst of generality, then it winds up with a different X, and a different (and presumably much broader) Y. So IMO it’s obviously worth considering whether misalignment might start at such point.
(Oh wait, or are you saying that “AIs will experience a sudden burst of generality” is the implausible part? I was assuming from the boldface that you see “…at the same time…” as the implausible part.)
I see this as a very natural hypothesis because alignment requires a hopeful answer to the question:
“Given that an agent has ontology / beliefs X and action space Y, will it do things that we want it to do?”
If the agent experiences a sudden burst of generality, then it winds up with a different X, and a different (and presumably much broader) Y. So IMO it’s obviously worth considering whether misalignment might start at such point.
(Oh wait, or are you saying that “AIs will experience a sudden burst of generality” is the implausible part? I was assuming from the boldface that you see “…at the same time…” as the implausible part.)