Curt Tigges comments on Talent Needs of Technical AI Safety Teams

Curt Tigges 30 May 2024 22:02 UTC
9 points
9
Great post, but there is one part I’d like to push back on:
Iterators are also easier to identify, both by their resumes and demonstrated skills. If you compare two CVs of postdocs that have spent the same amount of time in academia, and one of them has substantially more papers (or GitHub commits) to their name than the other (controlling for quality), you’ve found the better Iterator. Similarly, if you compare two CodeSignal tests with the same score but different completion times, the one completed more quickly belongs to the stronger Iterator.
This seems like a bit of an over-claim. I would endorse a weaker claim, like “in the presence of a high volume of applicants, CodeSignal tests, GitHub commits, and paper count statistically provide some signal,” but the reality of work in the fields of research and software development is often such that there isn’t a clean correspondence between these measures and someone’s performance. In addition, all three of these measures are quite easy to game (or Goodhart).
For example, in research alone, not every paper entails the same-sized project; two high-quality papers could have an order of magnitude difference in the amount of work required to produce them. Not every research bet pays off, too—some projects don’t result in papers, and research management often plays a role in what directions get pursued (and dropped or not if they are unproductive). There are also many researchers who have made a career out of getting their names on as many papers as possible; there is an entire science to doing this that is completely independent of your actual research abilities.
In the case of CodeSignal evaluations, signal is likewise relatively low-dimensional and primarily conveys one thing: enough experience with a relatively small set of patterns that one can do the assessment very quickly. I’ve taken enough of these and seen enough reviews from senior engineers on CodeSignal tests to know that they capture only a small, specific part of what it takes to be a good engineer, and overemphasize speed (which is not the main thing you want from an actual senior engineer; you want quality as well as maintainability and readability, which often are at odds with speed. Senior engineers’ first instinct is not generally to jump in and start spitting out lines of code like their lives depend on it). Then there’s the issue of how hackable/gameable the assessments are; senior engineer Yanir Seroussi has a good blog post on CodeSignal specifically: https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/
I’m definitely not arguing that these metrics are useless, however. They do provide some signal (especially if the volume of applicants is high), but I’d suggest that we see them as imperfect proxies that we’re forced to use due to insufficient manpower for comprehensive candidate evaluations, rather than actually capturing some kind of ground truth.
- Ryan Kidd 30 May 2024 22:14 UTC
  2 points
  1
  Parent
  Yeah, I basically agree with this nuance. MATS really doesn’t want to overanchor on CodeSignal tests or publication count in scholar selection.