Every purposive sample is opinionated, which is why we’re asking for community feedback. Do you have other figures or institutions in mind we should include that would balance our selection?
We seriously considered incorporating citation analysis, but couldn’t figure out a way to execute it for this project in a way that makes sense. Overlaying the directionality of citations with co-authorships would be fascinating, and a citation-based corpus would capture a different and much larger population, but it’s tricky. In a fast-moving field of posts and preprints it would be challenging to find a convincing formula for time-weighting the number of citations papers receive, and the valence of a citation is much more ambiguous than publicly acknowledged co-authorship. We ultimately chose co-authorship because it’s easier to interpret, and because we are focused on collaboration structure rather than influence.
Your observations on the historical or ideological alliances between labs and their preferred universities are something we should definitely dig into more. Implementing Owain’s suggestion of comparing against a quantitatively defined corpus could also speak to some of this as it would probably include more capabilities-adjacent work.
We do want to include a comparison case and were originally thinking along the lines of a different area of computer science, or a different emerging technology with significant ties to industry like blockchain mechanism design. The advantage of using capabilities as the comparison would be that it’s analytically and narratively richer, and could contribute to the story the paper tells. The disadvantage as I see it is that it would serve as less of an external validation, eg. answering the question to what extent the ‘interstitiality’ of AI safety is special or typical.