There’s not going to be one right answer.
The outcome pump. This cashes out “wants” in a non-anthropomorphic way. John Wentworth has some good work using this in more non-obvious ways.
Model-based RL. Potentially brain-inspired. This is what I try to think about most of the time.
Model-free RL. I think a lot of inner alignment arguments, and also some “shard theory” type arguments, use a background model-free RL picture.
Predictive models. Large language models are important, people often interpret them as a prototype for future AI.
Anthropomorphism. Usually not valid, but sometimes used anyway.
There’s not going to be one right answer.
The outcome pump. This cashes out “wants” in a non-anthropomorphic way. John Wentworth has some good work using this in more non-obvious ways.
Model-based RL. Potentially brain-inspired. This is what I try to think about most of the time.
Model-free RL. I think a lot of inner alignment arguments, and also some “shard theory” type arguments, use a background model-free RL picture.
Predictive models. Large language models are important, people often interpret them as a prototype for future AI.
Anthropomorphism. Usually not valid, but sometimes used anyway.