Thanks for the post. It covers an important debate: whether mechanistic interpretability is worth pursuing as a path towards safer AI. The post is logical and makes several good points but I find it’s style too formal for LessWrong and it could be rewritten to be more readable.
Thanks for the post. It covers an important debate: whether mechanistic interpretability is worth pursuing as a path towards safer AI. The post is logical and makes several good points but I find it’s style too formal for LessWrong and it could be rewritten to be more readable.