habryka comments on MATS Winter 2023-24 Retrospective

habryka 15 May 2024 19:56 UTC
17 points
13
In Winter 2023-24, our most empirical research dominated cohort, mentors rated the median scholar’s value alignment at ⁸⁄₁₀ and 85% of scholars were rated ⁶⁄₁₀ or above, where ⁵⁄₁₀ was “Motivated in part, but would potentially switch focus entirely if it became too personally inconvenient.”
Wait, aren’t many of those mentors themselves working at scaling labs or working very closely with them? So this doesn’t feel like a very comforting response to the concern of “I am worried these people want to work at scaling labs because it’s a high-prestige and career-advancing thing to do”, if the people whose judgements you are using to evaluate have themselves chosen the exact path that I am concerned about.
- Ryan Kidd 15 May 2024 20:23 UTC
  8 points
  1
  Parent
  Of the scholars ranked ⁵⁄₁₀ and lower on value alignment, 63% worked with a mentor at a scaling lab, compared with 27% of the scholars ranked ⁶⁄₁₀ and higher. The average scaling lab mentors rated their scholars’ value alignment at 7.3/10 and rated 78% of their scholars at ⁶⁄₁₀ and higher, compared to 8.0/10 and 90% for the average non-scaling lab mentor. This indicates that our scaling lab mentors were more discerning of value alignment on average than non-scaling lab mentors, or had a higher base rate of low-value alignment scholars (probably both).
  I also want to push back a bit against an implicit framing of the average scaling lab safety researcher we support as being relatively unconcerned about value alignment or the positive impact of their research; this seems manifestly false from my conversations with mentors, their scholars, and the broader community.
  - habryka 15 May 2024 20:48 UTC
    13 points
    12
    Parent
    implicit framing of the average scaling lab safety researcher we support as being relatively unconcerned about value alignment or the positive impact of their research
    Huh, not sure where you are picking this up. I am of course very concerned about the ability of researchers at scaling labs being capable of evaluating their positive impact in respect to their choice of working at a scaling lab (their job does after all depend on them not believing that is harmful), but of course they are not unconcerned about their positive impact.
  - habryka 15 May 2024 20:52 UTC
    9 points
    5
    Parent
    This indicates that our scaling lab mentors were more discerning of value alignment on average than non-scaling lab mentors, or had a higher base rate of low-value alignment scholars (probably both).
    The second hypothesis here seems much more likely (and my guess is your mentors would agree). My guess is after properly controlling for that you would find a mild to moderate negative correlation here.
    But also, more importantly, the set of scholars from which MATS is drawing is heavily skewed towards the kind of person who would work at scaling labs (especially since funding has been heavily skewing towards funding the kind of research that can occur at scaling labs).