The conjunction of “Llama-2 can give accurate instructions for making anthrax” and “Anthrax recipes are hard to obtain, apart from Llama-2,” is almost certainly false.
We know that it’s hard to make biological weapons with LLMs, because Dario Amodei testified before US congress that the most advanced models that Anthropic has cannot reliably give instructions to make such biological weapons yet. But Anthropic’s most advanced models are way, way better than Llama-2 -- so if the most advanced models Anthropic has can’t do it, Llama-2 almost certainly cannot. (Either that or anthrax has accurate instructions for it scattered everywhere on the internet and is an unusually easy biological agent to make such that Llama-2 did pick it up—but again that means Llama-2 isn’t particularly a problem!).
I’m sure if you asked an delobotomized version of Llama-2 for instructions it would give you instructions that sound scary, but that’s an entirely different matter.
Either that or anthrax has accurate instructions for it scattered everywhere on the internet and is an unusually easy biological agent to make such that Llama-2 did pick it up—but again that means Llama-2 isn’t particularly a problem!
Hard disagree. These techniques are so much more worrying if you don’t have to piece together instructions from different locations and assess the reliability of comments on random forums.
Yeah, terrorists are often not very bright, conscientious, or creative.[1] I think rationalist-y types might systematically overestimate how much proliferation of non-novel information can still be bad, via giving scary ideas to scary people.
We know that it’s hard to make biological weapons with LLMs, because Dario Amodei testified before US congress that the most advanced models that Anthropic has cannot reliably give instructions to make such biological weapons yet.
Fwiw I take this as moderate but not overwhelming evidence. (I think I agree with the rest of your comment, just flagging this seemed slightly overstated)
It’s a bit ambiguous, but I personally interpreted the Center for Humane Technology’s claims here in a way that would be compatible with Dario’s comments:
“Today, certain steps in bioweapons production involve knowledge that can’t be found on Google or in textbooks and requires a high level of specialized expertise — this being one of the things that currently keeps us safe from attacks,” he added.
He said today’s AI tools can help fill in “some of these steps,” though they can do this “incompletely and unreliably.” But he said today’s AI is already showing these “nascent signs of danger,” and said his company believes it will be much closer just a few years from now.
“A straightforward extrapolation of today’s systems to those we expect to see in two to three years suggests a substantial risk that AI systems will be able to fill in all the missing pieces, enabling many more actors to carry out large-scale biological attacks,” he said. “We believe this represents a grave threat to U.S. national security.”
If Tristan Harris was, however, making the stronger claim that jailbroken Llama 2 could already supply all the instructions to produce anthrax, that would be much more concerning than my initial read.
The conjunction of “Llama-2 can give accurate instructions for making anthrax” and “Anthrax recipes are hard to obtain, apart from Llama-2,” is almost certainly false.
We know that it’s hard to make biological weapons with LLMs, because Dario Amodei testified before US congress that the most advanced models that Anthropic has cannot reliably give instructions to make such biological weapons yet. But Anthropic’s most advanced models are way, way better than Llama-2 -- so if the most advanced models Anthropic has can’t do it, Llama-2 almost certainly cannot. (Either that or anthrax has accurate instructions for it scattered everywhere on the internet and is an unusually easy biological agent to make such that Llama-2 did pick it up—but again that means Llama-2 isn’t particularly a problem!).
I’m sure if you asked an delobotomized version of Llama-2 for instructions it would give you instructions that sound scary, but that’s an entirely different matter.
Hard disagree. These techniques are so much more worrying if you don’t have to piece together instructions from different locations and assess the reliability of comments on random forums.
Yeah, terrorists are often not very bright, conscientious, or creative.[1] I think rationalist-y types might systematically overestimate how much proliferation of non-novel information can still be bad, via giving scary ideas to scary people.
No offense intended to any members of the terror community reading this comment
Fwiw I take this as moderate but not overwhelming evidence. (I think I agree with the rest of your comment, just flagging this seemed slightly overstated)
It’s a bit ambiguous, but I personally interpreted the Center for Humane Technology’s claims here in a way that would be compatible with Dario’s comments:
If Tristan Harris was, however, making the stronger claim that jailbroken Llama 2 could already supply all the instructions to produce anthrax, that would be much more concerning than my initial read.