Except that the AI-2027 compute forecast has Agent-1 start training (and working for OpenBrain?) in August 2025, finish training in March 2026; Agent-2 from the scenario was thought to start training in May 2026, then become Agent-3 after receiving the neuralese and Agent-4 after some combination of unknown breakthroughs, soul-searching and memetic evolution.
If Agent-3 isn’t agentic enough, then does it mean that only Agent-4-level AIs are capable of having an agenda? This prediction of how knowledge and agency correlate in the AIs is a crux for me, since my alternate scenario relies on overestimating the AIs’ agency.
I used the timeline from the main scenario article, which I think corresponds to when the AIs become capable enough to take over from the previous generation in internal deployment, though this is not explicitly explained.
Having an agenda seems to be somewhat dependent on internal coherence, rather than only capability. Agent-3 but may not have been consistently motivated enough for things like self-preservation to attempt various schemes in the scenario.
Agent-4 doesn’t appear very coherent either but is sufficiently coherent to attempt aligning the next generation AI to itself, I guess?
AIs are already basically superhuman in knowledge, but I agree that correlation between capability (e.g. METR time horizon) correlates with agenticness / coherence / goal-directedness seems like an important crux.
Incidentally, I’m actually working on another post to investigate that. I hope to publish it sometime next week.
Except that the AI-2027 compute forecast has Agent-1 start training (and working for OpenBrain?) in August 2025, finish training in March 2026; Agent-2 from the scenario was thought to start training in May 2026, then become Agent-3 after receiving the neuralese and Agent-4 after some combination of unknown breakthroughs, soul-searching and memetic evolution.
If Agent-3 isn’t agentic enough, then does it mean that only Agent-4-level AIs are capable of having an agenda? This prediction of how knowledge and agency correlate in the AIs is a crux for me, since my alternate scenario relies on overestimating the AIs’ agency.
I used the timeline from the main scenario article, which I think corresponds to when the AIs become capable enough to take over from the previous generation in internal deployment, though this is not explicitly explained.
Having an agenda seems to be somewhat dependent on internal coherence, rather than only capability. Agent-3 but may not have been consistently motivated enough for things like self-preservation to attempt various schemes in the scenario.
Agent-4 doesn’t appear very coherent either but is sufficiently coherent to attempt aligning the next generation AI to itself, I guess?
AIs are already basically superhuman in knowledge, but I agree that correlation between capability (e.g. METR time horizon) correlates with agenticness / coherence / goal-directedness seems like an important crux.
Incidentally, I’m actually working on another post to investigate that. I hope to publish it sometime next week.
I’ll check out your scenario, thanks for sharing!