Bogdan Ionut Cirstea comments on Corrigibility or DWIM is an attractive primary goal for AGI

Bogdan Ionut Cirstea 25 Nov 2023 23:01 UTC
5 points
0
Agree, and I’ve had similar/related thoughts on how DWIM seems like a pretty natural target for LLM alignment: https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/bogdan-ionut-cirstea-s-shortform?commentId=65czxJGyBuhqhBRex https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/bogdan-ionut-cirstea-s-shortform?commentId=GRjfMwLDFgw6qLnDv
- Seth Herd 26 Nov 2023 0:08 UTC
  5 points
  1
  Parent
  Thanks! This seems pretty obvious, from this perspective, right? But there’s a lot of concern that outer alignment being hard makes the alignment problem much harder. It seems like you can easily just punt on outer alignment, so I think it’s very likely that’s what people will do.