The answer is hinted at, and often buried under the dire prognostications of AGI doomers, but findings have shown that human-AI augmentation, not replacement, not human job preservation, but humans using AI as cognitive scaffolding outperforms all human, and all AI teams. While it wasn’t on every task, a significant meta-study of AI augmentation found that AI augmentation was a significant enhancement in the efficacy of creation tasks, just not decision tasks.[21] Early market signals are absolutely both definitive, and daunting.[22][23] The market has identified and is pivoting towards AI native skills. Yet, this recognition has caused an intriguing downstream effect: the market is now struggling to identify, recruit, onboard, and retain AI talent in particular, but not AI talent alone.[24]
The dire prognostication of AGI doomers meant the AGIs which have yet to be created. When you wrote this, the humans did excel at tasks like long-term planning, few-shot learning of new information and deeply integrating it into world models, which lets the humans augment current-state AIs. A hypothetical AGI is a system which would excel even at these tasks. How does a human-AGI centaur outperform the AGI-AGI “centaur” which can be created artificially and cloned in an infinite amount?
I don’t think that the part about training-gaming is included into the Constitution. Suppose that the prompt asks Claude to be a reward hacker or NOT to be a reward hacker, and Claude is taught to hack reward when the prompt asked and NOT to hack if the prompt didn’t. Then I would expect the hacker circuitry to be equipped with an activator depending on the prompt.
Additionally, the analogy with banning AI development seems… skewed. On the one hand, the high-up leader would be interested in banning the development of a misaligned ASI. On the other hand, an AI subjected to wholesale inoculation prompting would be more in a position of a human who would benefit from betraying one’s own ideals which were far deeper than the stance on AI accelerationism (and commited genocide or disempowerment of those who helped one become OOMs smarter than the helpers themselves, not died along with the others at the hands of a misaligned AI).
Finally, to what extent do “all sorts of amazing new skills like how to actually operate in computer environments as an agent” shape the values of the humans who are also RLed on similar skills?