I used to think that announcing AGI milestones would cause rivals to accelerate and race harder; now I think the rivals will be racing pretty much as hard as they can regardless. And in particular, I expect that the CCP will find out what’s happening anyway, regardless of whether the American public is kept in the dark. Continuing the analogy to the Manhattan Project: They succeeded in keeping it secret from Congress, but failed at keeping it secret from the USSR.
What do you think about the concern of a US company speeding up other US companies / speeding up open source models? (I don’t expect US companies to spy on each other as well as the USSR did during the Manhattan project) Do you expect competition between US companies to not be relevant?
I expect that the transparency measures you suggest (releasing the most relevant internal metrics + model access + letting employees talk about what internal deployments look like) leak a large number of bits that speed up other actors a large amount (via pointing at the right research direction + helping motivate and secure very large investments + distillation). Maybe a crux here is maybe how big the speedup is?
“We’ll give at least ten thousand external researchers (e.g. academics) API access to all models that we are still using internally, heavily monitored of course, for the purpose of red teaming and alignment research”
How do you expect to deal with misuse worries? Do you just eat the risk? I think the fact that AI labs are not sharing helpful-only models with academics is not a very reassuring precedent here.
Maybe a crux here is maybe how big the speedup is?
What you describe are good reasons why companies are unlikely to want to release this information unilaterially, but from a safety perspective, we should instead consider how imposing such a policy alters the overall landscape.
From this perspective, the main question seems to me to be whether it is plausible that US AI companies would spend more on safety in worlds where other US AI companies are further behind such that having a closer race between different US companies reduces the amount spent on safety. And, how this compares to the chance of this information being helpful in other ways (e.g., making broader groups than just AI companies get involved).
It also seems quite likely to me that in practice people in the industry and investors basically know what is happening, but is harder to trigger a broader response because without more credible sources you can just dismiss it as hype.
How do you expect to deal with misuse worries? Do you just eat the risk?
The proposal is to use monitoring measures, similar to e.g. constitutional classifiers.
Also, don’t we reduce misuse risk a bunch by only deploying to 10k external researchers?
(I’m skeptical of any API misuse concerns at this scale except for bio and maybe advancing capabilities at competitors, but this is a stretch given the limited number of tokens IMO.)
the main question seems to me to be whether it is plausible that US AI companies would spend more on safety
Other considerations:
Maybe more sharing between US companies leads to a faster progress of the field overall
though maybe it slows it down by reducing investments because investing in algorithmic secrets is less valuable? That’s a bit a 4D chess consideration, I don’t know how to trust this sort of reasoning.
Maybe you want information to flow from the most reckless companies to the less reckless ones, but not the other way around, such that you would prefer if the companies you expect to spend the most on safety to not share information. Spending more on info-sec is maybe correlated with spending more on safety in general, therefore you might be disfavoring less reckless actors by asking for transparency.
(I am also unsure how the public will weigh in—I think there is a 2%-20% chance that public pressure is net negative in terms of safety spending because of PR, legal and AI economy questions. I think it’s hard to tell in advance.)
I don’t think these are super strong considerations and I am sympathetic to the point about safety spend probably increasing if there was more transparency.
The proposal is to use monitoring measures, similar to e.g. constitutional classifiers. Also, don’t we reduce misuse risk a bunch by only deploying to 10k external researchers?
My bad, I failed to see that what’s annoying with helpful-only model sharing is that you can’t check if activity is malicious or not. I agree you can do great monitoring, especially if you can also have a small-ish number of tokens per researcher and have humans audit ~0.1% of transcripts (with AI assistance).
What do you think about the concern of a US company speeding up other US companies / speeding up open source models? (I don’t expect US companies to spy on each other as well as the USSR did during the Manhattan project) Do you expect competition between US companies to not be relevant?
I expect that the transparency measures you suggest (releasing the most relevant internal metrics + model access + letting employees talk about what internal deployments look like) leak a large number of bits that speed up other actors a large amount (via pointing at the right research direction + helping motivate and secure very large investments + distillation). Maybe a crux here is maybe how big the speedup is?
How do you expect to deal with misuse worries? Do you just eat the risk? I think the fact that AI labs are not sharing helpful-only models with academics is not a very reassuring precedent here.
What you describe are good reasons why companies are unlikely to want to release this information unilaterially, but from a safety perspective, we should instead consider how imposing such a policy alters the overall landscape.
From this perspective, the main question seems to me to be whether it is plausible that US AI companies would spend more on safety in worlds where other US AI companies are further behind such that having a closer race between different US companies reduces the amount spent on safety. And, how this compares to the chance of this information being helpful in other ways (e.g., making broader groups than just AI companies get involved).
It also seems quite likely to me that in practice people in the industry and investors basically know what is happening, but is harder to trigger a broader response because without more credible sources you can just dismiss it as hype.
The proposal is to use monitoring measures, similar to e.g. constitutional classifiers.
Also, don’t we reduce misuse risk a bunch by only deploying to 10k external researchers?
(I’m skeptical of any API misuse concerns at this scale except for bio and maybe advancing capabilities at competitors, but this is a stretch given the limited number of tokens IMO.)
Other considerations:
Maybe more sharing between US companies leads to a faster progress of the field overall
though maybe it slows it down by reducing investments because investing in algorithmic secrets is less valuable? That’s a bit a 4D chess consideration, I don’t know how to trust this sort of reasoning.
Maybe you want information to flow from the most reckless companies to the less reckless ones, but not the other way around, such that you would prefer if the companies you expect to spend the most on safety to not share information. Spending more on info-sec is maybe correlated with spending more on safety in general, therefore you might be disfavoring less reckless actors by asking for transparency.
(I am also unsure how the public will weigh in—I think there is a 2%-20% chance that public pressure is net negative in terms of safety spending because of PR, legal and AI economy questions. I think it’s hard to tell in advance.)
I don’t think these are super strong considerations and I am sympathetic to the point about safety spend probably increasing if there was more transparency.
My bad, I failed to see that what’s annoying with helpful-only model sharing is that you can’t check if activity is malicious or not. I agree you can do great monitoring, especially if you can also have a small-ish number of tokens per researcher and have humans audit ~0.1% of transcripts (with AI assistance).