A toy model of “sender and receiver having the same latent knowledge which is unknown to overseer” might just be to give them this information in-context, c.f. Apollo scheming evals, or to finetune it in, c.f. OOCR
A toy model of “sender and receiver having the same latent knowledge which is unknown to overseer” might just be to give them this information in-context, c.f. Apollo scheming evals, or to finetune it in, c.f. OOCR