ryan_greenblatt comments on How might we safely pass the buck to AI?

ryan_greenblatt 20 Feb 2025 7:46 UTC
LW: 2 AF: 2
0
AF
To be clear, I think there are important additional considerations related to the fact that we don’t just care about capabilities that aren’t covered in that section, though that section is not that far from what I would say if you renamed it to “behavioral tests”, including both capabilities and alignment (that is, alignment other than stuff that messes with behavioral tests).
- joshc 20 Feb 2025 22:33 UTC
  LW: 2 AF: 1
  0
  AF Parent
  Yeah that’s fair. Currently I merge “behavioral tests” into the alignment argument, but that’s a bit clunky and I prob should have just made the carving:
  1. looks good in behavioral tests
  2. is still going to generalize to the deferred task
  
  But my guess is we agree on the object level here and there’s a terminology mismatch. obv the models have to actually behave in a manner that is at least as safe as human experts in addition to also displaying comparable capabilities on all safety-related dimensions.