As someone currently quite involved in bioweapon risk evals of AI models… I appreciate this post, but also feel like the intro really ought to at least mention the hard part. The hard part is what to do about open source models. Given that the relevant dangerous knowledge for bioweapon construction is, unfortunately, already publicly available, any plan for dealing with misuse has to assume the bad actors have that info.
The question then is, if a model has been trained without the bad knowledge, how much additional compute is required to fine-tune the knowledge in? How much effort/cost is that process compared to directly reading and understanding the academic papers?
My best guess as to the answers for these questions is ‘only a small amount of extra compute, like <5% of the training compute’ and ‘much quicker and easier than reading and understanding thousands of academic papers’.
If these are both true, then to prevent the most likely path to misuse you must prevent the release of the open source model in the first place. Or else, come up with some sort of special training regime which greatly increases the cost of fine-tuning dangerous knowledge into the model.
How about as the state of the art of hardware and algorithms continue to advance? Open source models of sufficient capability to be dangerous get cheaper and easier to produce. What kind of monitoring regime would be required to stop groups or wealthy individuals from training their own models on dangerous knowledge?
Yes, my hope here would be to prevent the release of these open source model. I’ll add a note to the intro. The post is about “How could an AI lab serving AIs to customers manage catastrophic misuse?” and assumes that the AI lab has already properly contained their AI (made it hard to steal, let alone open sourcing it).
Note that the approaches I discuss here don’t help at all with catastrophic misuse of open source AIs. Distinct approaches would be required to address that problem, such as ensuring that powerful AIs which substantially lower the bar to building bioweapons aren’t open sourced. (At least, not open sourced until sufficiently strong bioweapons defense systems exist or we ensure there are sufficiently large difficulties elsewhere in the bioweapon creation process.)
Thanks Ryan. Although, it’s unclear how powerful a model needs to be to be dangerous (which is why I’m working on evals to measure this). In my opinion, Llama2 and Mixtral are potentially already quite dangerous given the right fine-tuning regime.
So, if I’m correct, we’re already at the point of ‘too late, the models have been released. The problem will get only get worse as more powerful models are released. Only the effort of processing the raw data into a training set and running the fine-tuning is saving the world from having a seriously dangerous bioweapon-designer-tuned LLM in bad actors hands.’
Of course, that’s just like… my well-informed opinion, man. I deliberately created such a bioweapon-designer-tuned LLM from Llama 70B as part of my red teaming work on biorisk evals. It spits out much scarier information than a google search supplies. Much. I realize there’s a lot of skepticism around my claim on this, so not much can be done until better objective evaluations of the riskiness of the bioweapon design capability are developed. For now, I can only say that this is my opinion from personal observations.
It spits out much scarier information than a google search supplies. Much.
I see a sense in which GPT-4 is completely useless for serious programming in the hands of a non-programmer who wouldn’t be capable/inclined to become a programmer without LLMs, even as it’s somewhat useful for programming (especially with unfamiliar but popular libraries/tools). So the way in which a chatbot helps needs qualification.
One possible measure is how much a chatbot increases the fraction of some demographic that’s capable of some achievement within some amount of time. All these “changes the difficulty by 4x” or “by 1.25x” need to mean something specific, otherwise there is hopeless motte-and-bailey that allows credible reframing of any data as fearmongering. That is, even when it’s only intuitive guesses, the intuitive guesses should be about a particular meaningful thing rather than level of scariness. Something prediction-marketable.
Yes, I quite agree. Do you have suggestions for what a credible objective eval might consist of? What sort of test would seem convincing to you, if administered by a neutral party?
Here’s my guess (which is maybe the obvious thing to do).
Take bio undergrads, have them do synthetic biology research projects (ideally using many of the things which seem required for bioweapons), randomize into two groups where one is allowed to use LLMs (e.g. GPT-4) and one isn’t. The projects should ideally have a reasonable duration (at least >1 week, more ideally >4 weeks). Also, for both groups, provide high level research advice/training about how to use the research tools they are given (in the LLM case, advice about how to best use LLMs).
Then, have experts in the field assess the quality of projects.
For a weaker preliminary experiment, you could do 2-4 hour experiments of doing some quick synth bio lab experiment with the same approximate setup (but there are complications with the shortened duration).
In my opinion, Llama2 and Mixtral are potentially already quite dangerous given the right fine-tuning regime.
Indeed, I think it seems pretty unlikely that these models (finetuned effectively using current methods) change the difficulty of making a bioweapon by more than a factor of 4x. (Though perhaps you think something like “these models (finetuned effectively) make it maybe 25% easier to make a bioweapon and that’s pretty scary”.)
Yes, I’m unsure about the multiplication factors on “likelihood of bad actor even trying to make a bioweapon” and on “likelihood of succeeding given the attempt”. I think probably both are closer to 4x than 1.25x. But I think it’s understandable that this claim on my part seems implausible. Hopefully at some point I’ll have a more objective measure available.
As someone currently quite involved in bioweapon risk evals of AI models… I appreciate this post, but also feel like the intro really ought to at least mention the hard part. The hard part is what to do about open source models. Given that the relevant dangerous knowledge for bioweapon construction is, unfortunately, already publicly available, any plan for dealing with misuse has to assume the bad actors have that info.
The question then is, if a model has been trained without the bad knowledge, how much additional compute is required to fine-tune the knowledge in? How much effort/cost is that process compared to directly reading and understanding the academic papers?
My best guess as to the answers for these questions is ‘only a small amount of extra compute, like <5% of the training compute’ and ‘much quicker and easier than reading and understanding thousands of academic papers’.
If these are both true, then to prevent the most likely path to misuse you must prevent the release of the open source model in the first place. Or else, come up with some sort of special training regime which greatly increases the cost of fine-tuning dangerous knowledge into the model.
How about as the state of the art of hardware and algorithms continue to advance? Open source models of sufficient capability to be dangerous get cheaper and easier to produce. What kind of monitoring regime would be required to stop groups or wealthy individuals from training their own models on dangerous knowledge?
Yes, my hope here would be to prevent the release of these open source model. I’ll add a note to the intro. The post is about “How could an AI lab serving AIs to customers manage catastrophic misuse?” and assumes that the AI lab has already properly contained their AI (made it hard to steal, let alone open sourcing it).
I added:
Thanks Ryan. Although, it’s unclear how powerful a model needs to be to be dangerous (which is why I’m working on evals to measure this). In my opinion, Llama2 and Mixtral are potentially already quite dangerous given the right fine-tuning regime.
So, if I’m correct, we’re already at the point of ‘too late, the models have been released. The problem will get only get worse as more powerful models are released. Only the effort of processing the raw data into a training set and running the fine-tuning is saving the world from having a seriously dangerous bioweapon-designer-tuned LLM in bad actors hands.’
Of course, that’s just like… my well-informed opinion, man. I deliberately created such a bioweapon-designer-tuned LLM from Llama 70B as part of my red teaming work on biorisk evals. It spits out much scarier information than a google search supplies. Much. I realize there’s a lot of skepticism around my claim on this, so not much can be done until better objective evaluations of the riskiness of the bioweapon design capability are developed. For now, I can only say that this is my opinion from personal observations.
I see a sense in which GPT-4 is completely useless for serious programming in the hands of a non-programmer who wouldn’t be capable/inclined to become a programmer without LLMs, even as it’s somewhat useful for programming (especially with unfamiliar but popular libraries/tools). So the way in which a chatbot helps needs qualification.
One possible measure is how much a chatbot increases the fraction of some demographic that’s capable of some achievement within some amount of time. All these “changes the difficulty by 4x” or “by 1.25x” need to mean something specific, otherwise there is hopeless motte-and-bailey that allows credible reframing of any data as fearmongering. That is, even when it’s only intuitive guesses, the intuitive guesses should be about a particular meaningful thing rather than level of scariness. Something prediction-marketable.
I was trying to say “cost in time/money goes down by that factor for some group”.
Yes, I quite agree. Do you have suggestions for what a credible objective eval might consist of? What sort of test would seem convincing to you, if administered by a neutral party?
Here’s my guess (which is maybe the obvious thing to do).
Take bio undergrads, have them do synthetic biology research projects (ideally using many of the things which seem required for bioweapons), randomize into two groups where one is allowed to use LLMs (e.g. GPT-4) and one isn’t. The projects should ideally have a reasonable duration (at least >1 week, more ideally >4 weeks). Also, for both groups, provide high level research advice/training about how to use the research tools they are given (in the LLM case, advice about how to best use LLMs).
Then, have experts in the field assess the quality of projects.
For a weaker preliminary experiment, you could do 2-4 hour experiments of doing some quick synth bio lab experiment with the same approximate setup (but there are complications with the shortened duration).
Indeed, I think it seems pretty unlikely that these models (finetuned effectively using current methods) change the difficulty of making a bioweapon by more than a factor of 4x. (Though perhaps you think something like “these models (finetuned effectively) make it maybe 25% easier to make a bioweapon and that’s pretty scary”.)
Yes, I’m unsure about the multiplication factors on “likelihood of bad actor even trying to make a bioweapon” and on “likelihood of succeeding given the attempt”. I think probably both are closer to 4x than 1.25x. But I think it’s understandable that this claim on my part seems implausible. Hopefully at some point I’ll have a more objective measure available.