Author here. We were heavily inspired by multiple things, including Demski and Garrabrant, the 1990′s work of Kalai and Lehrer, empirical work in our group inspired by neuroscience pointing towards systems that predict their own actions, and the earlier work on reflective oracles by Leike . We were not aware of @Cole Wyeth et al.’s excellent 2025 paper which puts the reflective oracle work on firmer theoretical footing, as our work was (largely but not entirely) done before this paper appeared.
Author here. We were heavily inspired by multiple things, including Demski and Garrabrant, the 1990′s work of Kalai and Lehrer, empirical work in our group inspired by neuroscience pointing towards systems that predict their own actions, and the earlier work on reflective oracles by Leike . We were not aware of @Cole Wyeth et al.’s excellent 2025 paper which puts the reflective oracle work on firmer theoretical footing, as our work was (largely but not entirely) done before this paper appeared.