This overestimates the impact of large models on external safety research. My impression is that the AI safety community has barely used deepseek r1 and v3 open source weights at all. I checked again and still see little evidence of v3/r1 weights in safety research. People use r1 distill 8b, and qwq 32b, but the decision to open source the most capable small model is different than the decision to open source the frontier. So then it matters when 8b or 32b models can assist with bioterrorism, which happens a bit later, and we get most of the benefits of open source until then. It’s also cheaper to filter virology or even all biology data out of training for a small models pre training data because it wouldn’t cause customers to switch providers (customers prefer large model anyway) and small models are more often narrowly focused on math or coding.
To be clear, I agree it’s quite uncertain. I put substantial chance on further open source models basically not mattering and some chance on them being very important.
I think people do use llama 70b and I predict people will increasingly use somewhat bigger models.
Also, it’s worth noting that 70b models can seemingly be pretty close to the current frontier!
This overestimates the impact of large models on external safety research. My impression is that the AI safety community has barely used deepseek r1 and v3 open source weights at all. I checked again and still see little evidence of v3/r1 weights in safety research. People use r1 distill 8b, and qwq 32b, but the decision to open source the most capable small model is different than the decision to open source the frontier. So then it matters when 8b or 32b models can assist with bioterrorism, which happens a bit later, and we get most of the benefits of open source until then. It’s also cheaper to filter virology or even all biology data out of training for a small models pre training data because it wouldn’t cause customers to switch providers (customers prefer large model anyway) and small models are more often narrowly focused on math or coding.
I expect the benefits of open large models on safety research to increase over time as open source tooling improves.
To be clear, I agree it’s quite uncertain. I put substantial chance on further open source models basically not mattering and some chance on them being very important.
I think people do use llama 70b and I predict people will increasingly use somewhat bigger models.
Also, it’s worth noting that 70b models can seemingly be pretty close to the current frontier!