In addition to this, Dr. Roman Jampolsky says the same thing I predicted in my version of AI 2027- someone will use AI to unleash some diabolical virus or bio weapon that gets to is much before superintelligence would, which you can read here which happens in late-2028 in my version of the scenario, which also got comments and attention from Ex-OpenAI researcher Daniel Kokotajlo and co-author of AI 2027.
(sigh) I agree that all Yampolskiy’s arguments that you describe in your post are plausible, especially if p(AI is aligned under best techniques) is low, but here I have to step in. The original scenario’s Race Ending had THE AI decide to take over by using the diabolical bioweapons. Who else can do it?
Having an AI company unleash bioweapons doesn’t make sense to me since the company can NEGOTIATE with OpenBrain or the USG to be merged and have the leadership KEEP some power instead of trying to avenge itself due to compute being confiscated.
Similarly, terrorists unleashing AI-created bioweapons have to gain access to the WEIGHTS[1] of sufficiently capable models, since otherwise the models would be IN TOTAL CONTROL of the companies or faking alignment to THE COMPANIES, not to the terrorists.
As for your attempt at writing scenarios, Kokotajlo’sfivetoplevelcomments denounce[2] it since Agent-4-delinquent can be trivially noticed and have its open misbehavior trained away, but Agent-4-spy and Agent-5-spy who fakes alignment until the time comes to take over has NO reason to do anything obviously harmful before the time comes. And that’s ignoring the fact that AI takeover could end up “keeping almost all current humans alive and maybe even giving them decent (though very weird and disempowered)[3] lives”.
One of the comments reads as follows: “Thanks for taking up this challenge! I think your scenario starts off somewhat plausible but descended into implausibility in early 2028.” The likely reason why he thanked you is that, as he remarked when dealing with another scenario, “very few people have taken us up on our offer to create scenarios so far”. I have already overviewed the scenarios and critique released by now.
I agree that it might not make sense for an AI company itself to unleash bioweapons. But it could make sense to a giant pharma company to reap profits by creating a biovirus along with manufacturing its antidote. The incentives are huge for this anyways for any actor who gets away with this undetected.
We are already having GPT-5 with biorisk capabilities and which could be exploited using jailbreaks and its report suggests that out of 46 jailbreaks, they think only 3 could practical help in bioweapon development. Which they say have now been blocked by the monitor. Considering this, even more red-teaming efforts like 50 more novel ways of jailbreaks could easily have minimum two or three chances to get sufficient practical insights to create bioweapons. These chances could be more than claimed by these AI companies because there are already questions about whether the labs are actually correctly evaluating and fully measuring the bioweapon capabilities of a model.
And evaluating a model for absence of certain capability is even more difficult when running evals, which means a model could have hidden more severe capability and risk that the evals have failed to cover potentially exacerbating risks across all domains including biorisk.
Additionally, I don’t think any bad actor needs to have access to the weights to capable models. At best, it needs to have sufficient biology/virology knowledge along with jailbreaking a capable model (or fine-tuning an open-source AI model with similar capability exploiting dual-use risks). And IMO it’s only going to get more feasible with time.
(sigh) I agree that all Yampolskiy’s arguments that you describe in your post are plausible, especially if p(AI is aligned under best techniques) is low, but here I have to step in. The original scenario’s Race Ending had THE AI decide to take over by using the diabolical bioweapons. Who else can do it?
Having an AI company unleash bioweapons doesn’t make sense to me since the company can NEGOTIATE with OpenBrain or the USG to be merged and have the leadership KEEP some power instead of trying to avenge itself due to compute being confiscated.
Similarly, terrorists unleashing AI-created bioweapons have to gain access to the WEIGHTS[1] of sufficiently capable models, since otherwise the models would be IN TOTAL CONTROL of the companies or faking alignment to THE COMPANIES, not to the terrorists.
As for your attempt at writing scenarios, Kokotajlo’s five top level comments denounce[2] it since Agent-4-delinquent can be trivially noticed and have its open misbehavior trained away, but Agent-4-spy and Agent-5-spy who fakes alignment until the time comes to take over has NO reason to do anything obviously harmful before the time comes. And that’s ignoring the fact that AI takeover could end up “keeping almost all current humans alive and maybe even giving them decent (though very weird and disempowered)[3] lives”.
This does happen in the Rogue Replication scenario.
One of the comments reads as follows: “Thanks for taking up this challenge! I think your scenario starts off somewhat plausible but descended into implausibility in early 2028.” The likely reason why he thanked you is that, as he remarked when dealing with another scenario, “very few people have taken us up on our offer to create scenarios so far”. I have already overviewed the scenarios and critique released by now.
For comparison, in my take the Angels’ goal is not to disempower humans, but to keep them from becoming parasites of the ASI.
I agree that it might not make sense for an AI company itself to unleash bioweapons. But it could make sense to a giant pharma company to reap profits by creating a biovirus along with manufacturing its antidote. The incentives are huge for this anyways for any actor who gets away with this undetected.
We are already having GPT-5 with biorisk capabilities and which could be exploited using jailbreaks and its report suggests that out of 46 jailbreaks, they think only 3 could practical help in bioweapon development. Which they say have now been blocked by the monitor. Considering this, even more red-teaming efforts like 50 more novel ways of jailbreaks could easily have minimum two or three chances to get sufficient practical insights to create bioweapons. These chances could be more than claimed by these AI companies because there are already questions about whether the labs are actually correctly evaluating and fully measuring the bioweapon capabilities of a model.
And evaluating a model for absence of certain capability is even more difficult when running evals, which means a model could have hidden more severe capability and risk that the evals have failed to cover potentially exacerbating risks across all domains including biorisk.
Additionally, I don’t think any bad actor needs to have access to the weights to capable models. At best, it needs to have sufficient biology/virology knowledge along with jailbreaking a capable model (or fine-tuning an open-source AI model with similar capability exploiting dual-use risks). And IMO it’s only going to get more feasible with time.