I’m confused. This seems like the central example of work I’m talking about. Where is it in the RFP? (Note I am imagining that debate is itself a training process, but that seems to be what you’re talking about as well.)
My bad, I was a bit sloppy here. The debate-for-control stuff is in the RFP but not the debate vs subtle reward hacks that don’t show up in feedback quality evals.
I think we agree that there are some flavors of debate work that are exciting and not present in the RFP.
My bad, I was a bit sloppy here. The debate-for-control stuff is in the RFP but not the debate vs subtle reward hacks that don’t show up in feedback quality evals.
I think we agree that there are some flavors of debate work that are exciting and not present in the RFP.