Thanks! This seems pretty obvious, from this perspective, right? But there’s a lot of concern that outer alignment being hard makes the alignment problem much harder. It seems like you can easily just punt on outer alignment, so I think it’s very likely that’s what people will do.
Agree, and I’ve had similar/related thoughts on how DWIM seems like a pretty natural target for LLM alignment: https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/bogdan-ionut-cirstea-s-shortform?commentId=65czxJGyBuhqhBRex https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/bogdan-ionut-cirstea-s-shortform?commentId=GRjfMwLDFgw6qLnDv
Thanks! This seems pretty obvious, from this perspective, right? But there’s a lot of concern that outer alignment being hard makes the alignment problem much harder. It seems like you can easily just punt on outer alignment, so I think it’s very likely that’s what people will do.