Rohin Shah comments on Research directions Open Phil wants to fund in technical AI safety

Rohin Shah 8 Feb 2025 22:11 UTC
LW: 3 AF: 3
0
AF
I don’t know of any existing work in this category, sorry. But e.g. one project would be “combine MONA and your favorite amplified oversight technique to oversee a hard multi-step task without ground truth rewards”, which in theory could work better than either one of them alone.