This analysis is confounded by the fact that GDM has a lot more non Gemini stuff (eg the science work) than the other labs. None of the labs publish most of their LLM capabilities work, but publishing science stuff is fine, so DeepMind having more other stuff means comparatively more non safety work gets published
I generally think you can’t really answer this question with the data sources you’re using, because IMO the key question is what fraction of the frontier LLM oriented with is on safety, but little of that is published.
By “data sources you’re using”, do you mean I have to be more careful in what kinds of public data to use, and to filter by topic to remove non-LLM research, or do you mean the only way to meaningfully answer this would need access to non-public data? I’ve not worked at a lab, but I would have thought we’d at least see the successful results, with a 6 month delay or so.
do you mean the only way to meaningfully answer this would need access to non-public data
That, unfortunately. Frontier labs rarely ever share research that helps improve the capabilities of frontier models (this will vary between the lab, of course, and many are still good about publishing commercially useful safety work)
This analysis is confounded by the fact that GDM has a lot more non Gemini stuff (eg the science work) than the other labs. None of the labs publish most of their LLM capabilities work, but publishing science stuff is fine, so DeepMind having more other stuff means comparatively more non safety work gets published
I generally think you can’t really answer this question with the data sources you’re using, because IMO the key question is what fraction of the frontier LLM oriented with is on safety, but little of that is published.
By “data sources you’re using”, do you mean I have to be more careful in what kinds of public data to use, and to filter by topic to remove non-LLM research, or do you mean the only way to meaningfully answer this would need access to non-public data? I’ve not worked at a lab, but I would have thought we’d at least see the successful results, with a 6 month delay or so.
That, unfortunately. Frontier labs rarely ever share research that helps improve the capabilities of frontier models (this will vary between the lab, of course, and many are still good about publishing commercially useful safety work)