the main question seems to me to be whether it is plausible that US AI companies would spend more on safety
Other considerations:
Maybe more sharing between US companies leads to a faster progress of the field overall
though maybe it slows it down by reducing investments because investing in algorithmic secrets is less valuable? That’s a bit a 4D chess consideration, I don’t know how to trust this sort of reasoning.
Maybe you want information to flow from the most reckless companies to the less reckless ones, but not the other way around, such that you would prefer if the companies you expect to spend the most on safety to not share information. Spending more on info-sec is maybe correlated with spending more on safety in general, therefore you might be disfavoring less reckless actors by asking for transparency.
(I am also unsure how the public will weigh in—I think there is a 2%-20% chance that public pressure is net negative in terms of safety spending because of PR, legal and AI economy questions. I think it’s hard to tell in advance.)
I don’t think these are super strong considerations and I am sympathetic to the point about safety spend probably increasing if there was more transparency.
The proposal is to use monitoring measures, similar to e.g. constitutional classifiers. Also, don’t we reduce misuse risk a bunch by only deploying to 10k external researchers?
My bad, I failed to see that what’s annoying with helpful-only model sharing is that you can’t check if activity is malicious or not. I agree you can do great monitoring, especially if you can also have a small-ish number of tokens per researcher and have humans audit ~0.1% of transcripts (with AI assistance).
Other considerations:
Maybe more sharing between US companies leads to a faster progress of the field overall
though maybe it slows it down by reducing investments because investing in algorithmic secrets is less valuable? That’s a bit a 4D chess consideration, I don’t know how to trust this sort of reasoning.
Maybe you want information to flow from the most reckless companies to the less reckless ones, but not the other way around, such that you would prefer if the companies you expect to spend the most on safety to not share information. Spending more on info-sec is maybe correlated with spending more on safety in general, therefore you might be disfavoring less reckless actors by asking for transparency.
(I am also unsure how the public will weigh in—I think there is a 2%-20% chance that public pressure is net negative in terms of safety spending because of PR, legal and AI economy questions. I think it’s hard to tell in advance.)
I don’t think these are super strong considerations and I am sympathetic to the point about safety spend probably increasing if there was more transparency.
My bad, I failed to see that what’s annoying with helpful-only model sharing is that you can’t check if activity is malicious or not. I agree you can do great monitoring, especially if you can also have a small-ish number of tokens per researcher and have humans audit ~0.1% of transcripts (with AI assistance).