Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 31 Jan 2025 19:37 UTC
1 point
0
A toy model of “sender and receiver having the same latent knowledge which is unknown to overseer” might just be to give them this information in-context, c.f. Apollo scheming evals, or to finetune it in, c.f. OOCR