Rohin Shah comments on The title is reasonable

Rohin Shah 22 Sep 2025 20:57 UTC
3 points
0
Yeah, that’s fair for agendas that want to directly study the circumstances that lead to scheming. Though when thinking about those agendas, I do find myself more optimistic because they likely do not have to deal with long time horizons, whereas capabilities work likely will have to engage with that.
Note many alignment agendas don’t need to actually study potential schemers. Amplified oversight can make substantial progress without studying actual schemers (but probably will face the long horizon problem). Interpretability can make lots of foundational progress without schemers, that I would expect to mostly generalize to schemers. Control can make progress with models prompted or trained to be malicious.
- Buck 23 Sep 2025 18:27 UTC
  4 points
  0
  Parent
  Amplified oversight can make substantial progress without studying actual schemers
  (Though note that it’s unclear whether this progress will mitigate scheming risk.)