Is there a consensus on the idea of “training an ai to help with alignment”? What are the reasons that this would / wouldn’t be productive?
John Wentworth categorizes this as Bad Idea, but elsewhere (I cannot remember where, it may have been in irl conversations) I’ve heard it discussed as being potentially useful.
Is there a consensus on the idea of “training an ai to help with alignment”? What are the reasons that this would / wouldn’t be productive?
John Wentworth categorizes this as Bad Idea, but elsewhere (I cannot remember where, it may have been in irl conversations) I’ve heard it discussed as being potentially useful.