Hey, welcome! I am very skeptical that we can get out good models of humans just by finding simple ones according to some simple metric. Instead, the metric itself by which we use to evaluate which models are good is a complicated bit of human preferences.
Hopefully we can still get the AI to do most of the hard work for us by learning it, but it does mean that that solution is un-”natural” in the way natural abstractions are natural, and would require us to have a better understanding of how to bootstrap value learning from a kernel of labeled data or human feedback.
Anyhow, I still found this post interesting enough to upvote, stick around.
Hey, welcome! I am very skeptical that we can get out good models of humans just by finding simple ones according to some simple metric. Instead, the metric itself by which we use to evaluate which models are good is a complicated bit of human preferences.
Hopefully we can still get the AI to do most of the hard work for us by learning it, but it does mean that that solution is un-”natural” in the way natural abstractions are natural, and would require us to have a better understanding of how to bootstrap value learning from a kernel of labeled data or human feedback.
Anyhow, I still found this post interesting enough to upvote, stick around.