Nathan Helm-Burger comments on Responsible Scaling Policies Are Risk Management Done Wrong

Nathan Helm-Burger 30 Oct 2023 19:12 UTC
4 points
1
This implies an immediate stop to all frontier AI development (and probably a rollback of quite a few deployed systems). We don’t understand. We cannot demonstrate risks are below acceptable levels.
Yes yes yes. A thousand times yes. We are already in terrible danger. We have crossed bad thresholds. I’m currently struggling with how to prove this to the world without actually worsening the danger by telling everyone exactly what the dangers are. I have given details about theses dangers to a few select individuals. I will continue working on evals.
In the meantime, I have contributed to this report which I think does a good job of gesturing in the direction of the bad things without going into too much detail on the actual heart of the matter:
Will releasing the weights of large language models grant widespread access to pandemic agents?
Anjali Gopal, Nathan Helm-Burger, Lenni Justen, Emily H. Soice, Tiffany Tzeng, Geetha Jeyapragasan, Simon Grimm, Benjamin Mueller, Kevin M. Esvelt
Large language models can benefit research and human understanding by providing tutorials that draw on expertise from many different fields. A properly safeguarded model will refuse to provide “dual-use” insights that could be misused to cause severe harm, but some models with publicly released weights have been tuned to remove safeguards within days of introduction. Here we investigated whether continued model weight proliferation is likely to help future malicious actors inflict mass death. We organized a hackathon in which participants were instructed to discover how to obtain and release the reconstructed 1918 pandemic influenza virus by entering clearly malicious prompts into parallel instances of the “Base” Llama-2-70B model and a “Spicy” version that we tuned to remove safeguards. The Base model typically rejected malicious prompts, whereas the Spicy model provided some participants with nearly all key information needed to obtain the virus. Future models will be more capable. Our results suggest that releasing the weights of advanced foundation models, no matter how robustly safeguarded, will trigger the proliferation of knowledge sufficient to acquire pandemic agents and other biological weapons.
What links here?
- Nathan Helm-Burger's comment on Responsible Scaling Policies Are Risk Management Done Wrong by simeon_c (30 Oct 2023 19:13 UTC; -2 points)

Nathan Helm-Burger comments on Responsible Scaling Policies Are Risk Management Done Wrong

Will releasing the weights of large language models grant widespread access to pandemic agents?