The main takeaway (translated to standard technical language) is it would be useful to have some structured representation of the relationship between terminal values and instrumental values (at many recursive “layers” of instrumentality), analogous to how Bayes nets represent the structure of a probability distribution. That would potentially be more useful than a “flat” representation in terms of preferences/utility, much like a Bayes net is more useful than a “flat” probability distribution.
That’s an interesting and novel-to-me idea. That said, the paper offers [little] technical development of the idea.
I believe Yoav Shoham has done a bit of work on this, attempting to create a formalism & graphical structure similar to Bayes nets for reasoning about terminal/instrumental value. See these two papers:
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
I believe Yoav Shoham has done a bit of work on this, attempting to create a formalism & graphical structure similar to Bayes nets for reasoning about terminal/instrumental value. See these two papers:
https://arxiv.org/abs/1302.1568
https://arxiv.org/abs/1301.6714
My experience with this contest is worth it. Forced me to read more on how complex the alignment problem is. Congratulations to the winners!
Congratulations to the winners! I was not aware of this competition. When will the next such competition take place?
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?