As a baseline, developers could train agents toimitate the truth-seeking process of the most reasonable humans on Earth. For example, they could sample the brightest intellects from every ideological walk, and train agents to predict their actions.
I’m very excited about strategies that involve lots of imitation learning on lots of particular humans. I’m not sure if imitated human researchers learn to generalize to doing lots of novel research, but this seems great for examining research outputs of slightly-more-alien agents very quickly.
I’m very excited about strategies that involve lots of imitation learning on lots of particular humans. I’m not sure if imitated human researchers learn to generalize to doing lots of novel research, but this seems great for examining research outputs of slightly-more-alien agents very quickly.