I used the timeline from the main scenario article, which I think corresponds to when the AIs become capable enough to take over from the previous generation in internal deployment, though this is not explicitly explained.
Having an agenda seems to be somewhat dependent on internal coherence, rather than only capability. Agent-3 but may not have been consistently motivated enough for things like self-preservation to attempt various schemes in the scenario.
Agent-4 doesn’t appear very coherent either but is sufficiently coherent to attempt aligning the next generation AI to itself, I guess?
AIs are already basically superhuman in knowledge, but I agree that correlation between capability (e.g. METR time horizon) correlates with agenticness / coherence / goal-directedness seems like an important crux.
Incidentally, I’m actually working on another post to investigate that. I hope to publish it sometime next week.
I used the timeline from the main scenario article, which I think corresponds to when the AIs become capable enough to take over from the previous generation in internal deployment, though this is not explicitly explained.
Having an agenda seems to be somewhat dependent on internal coherence, rather than only capability. Agent-3 but may not have been consistently motivated enough for things like self-preservation to attempt various schemes in the scenario.
Agent-4 doesn’t appear very coherent either but is sufficiently coherent to attempt aligning the next generation AI to itself, I guess?
AIs are already basically superhuman in knowledge, but I agree that correlation between capability (e.g. METR time horizon) correlates with agenticness / coherence / goal-directedness seems like an important crux.
Incidentally, I’m actually working on another post to investigate that. I hope to publish it sometime next week.
I’ll check out your scenario, thanks for sharing!