Maybe, but the schemers could optimize their research for looking-good-to-us, and it might be hard to distinguish this from actually good work
if you ask each variant to review the research of the other variants, then the schemers need to optimise their research for looking good to each variant. but the optimistic assumption is that at least one variant is an equally capable non-schemer.
then the schemers need to optimise their research for looking good to each variant
Not necessarily. The nice AIs also need to be able to win a debate against the schemers, as judged by humans. It’s not enough for the variants to be able to recognize poor research if they can’t show their work in an unrefutable (by the schemers) way.
if you ask each variant to review the research of the other variants, then the schemers need to optimise their research for looking good to each variant. but the optimistic assumption is that at least one variant is an equally capable non-schemer.
Not necessarily. The nice AIs also need to be able to win a debate against the schemers, as judged by humans. It’s not enough for the variants to be able to recognize poor research if they can’t show their work in an unrefutable (by the schemers) way.