Putting my money where my mouth is: I just uploaded a (significantly revised) version of my Alignment Problem position paper, where I attempt to describe the AGI alignment problem as rigorously as possible. The current version only has “policy learns to care about reward directly” as a footnote; I can imagine updating it based on the outcome of this discussion though.
Putting my money where my mouth is: I just uploaded a (significantly revised) version of my Alignment Problem position paper, where I attempt to describe the AGI alignment problem as rigorously as possible. The current version only has “policy learns to care about reward directly” as a footnote; I can imagine updating it based on the outcome of this discussion though.
For someone who’s read v1 of this paper, what would you recommend as the best way to “update” to v3? Is an entire reread the best approach?
[Edit March 11, 2023: Having now read the new version in full, my recommendation to anyone else with the same question is a full reread.]