[Writing off based on the quick take, haven’t looked into the linked thing.]
Google didn’t necessarily even break a commitment? The commitment mentioned in the article is to “publicly report model or system capabilities.” That doesn’t say it has to be done at the time of public deployment.
I think this statement lends itself to being by default interpreted as “inform the public about model/system capabilities in time” (because if they don’t do it “in time”, then what’s the point?), and the most “natural” “in-time” time is the time of deployment?
I announce that I’m committing to X. I can expect that most people will understand this to mean “I commit to Y” where X→Y is a natural (~unconscious?) inference for a human to make. And then I don’t do Y and defend myself by saying, “The only thing I committed to was X, why all the fuss about me not Y-ing?”.
Other companies are doing far worse in this dimension. At worst Google is 3rd-best in publishing eval results. Meta and xAI are far worse.
There might be a contextualizer-y justification for picking on Google more because they are more ahead than Meta and xAI, AI-wise.
I agree. If Google wanted to join the commitments but not necessarily publish eval results by the time of external deployment, it should have clarified “we’ll publish within 2 months after external deployment” or “we’ll do evals on our most powerful model at least every 4 months rather than doing one round of evals per model” or something.
[Writing off based on the quick take, haven’t looked into the linked thing.]
I think this statement lends itself to being by default interpreted as “inform the public about model/system capabilities in time” (because if they don’t do it “in time”, then what’s the point?), and the most “natural” “in-time” time is the time of deployment?
I announce that I’m committing to X. I can expect that most people will understand this to mean “I commit to Y” where X→Y is a natural (~unconscious?) inference for a human to make. And then I don’t do Y and defend myself by saying, “The only thing I committed to was X, why all the fuss about me not Y-ing?”.
There might be a contextualizer-y justification for picking on Google more because they are more ahead than Meta and xAI, AI-wise.
I agree. If Google wanted to join the commitments but not necessarily publish eval results by the time of external deployment, it should have clarified “we’ll publish within 2 months after external deployment” or “we’ll do evals on our most powerful model at least every 4 months rather than doing one round of evals per model” or something.