It seems to me that there has been enough unanswered criticism of the implications of coherence theorems for making predictions about AGI that it would be quite misleading to include this post in the 2019 review.
In an earlier review, johnswentworth argues:
I think instrumental convergence provides a strong argument that...we can use trade-offs with those resources in order to work out implied preferences over everything else, at least for the sorts of “agents” we actually care about (i.e. agents which have significant impact on the world).
I think this is a reasonable point, but also a very different type of argument from Eliezer’s argument, since it relies on things like economic incentives. Instead, when Eliezer critiques Paul’s concept of corrigibility, he says things like “deference is an unusually anti-natural shape for cognition”. How do coherence theorems translate to such specific claims about the “shape of cognition”; and why is grounding these theorems in “resources” a justifiable choice in this context? These are the types of follow-up arguments which seem necessary at this point in order for further promotion of this post to be productive rather than harmful.
It seems to me that there has been enough unanswered criticism of the implications of coherence theorems for making predictions about AGI that it would be quite misleading to include this post in the 2019 review.
If the post is the best articulation of a line of reasoning that has been influential in people’s thinking about alignment, then even if there are strong arguments against it, I don’t see why that means the post is not significant, at least from a historical perspective.
By analogy, I think Searle’s Chinese Room argument is wrong and misleading, but I wouldn’t argue that it shouldn’t be included in a list of important works on philosophy of mind.
Would you (assuming you disagreed with it)? If not, what’s the difference here?
(Put another way, I wouldn’t think of the review as a collection of “correct” posts, but rather as a collection of posts that were important contributions to our thinking. To me this certainly qualifies as that.)
Your argument is plausible. On the other hand, this review is for 2019, not 2017 (when this post was written) nor 2013 (when this series of ideas was originally laid out). So it seems like it should reflect our current-ish thinking.
I note that the page for the review doesn’t have anything about voting criteria. This seems like something of an oversight?
How do coherence theorems translate to such specific claims about the “shape of cognition”; and why is grounding these theorems in “resources” a justifiable choice in this context?
It occurs to me that one plausible answer here is that cognition requires computational resources, and therefore effective cognition will generically involve trading off these resources in a way that does not reliably lose them.
But my more relevant response is that in that section I don’t see Eliezer saying that coherence theorems are the justification for his claim about the anti-naturalness of deference.
I don’t see Eliezer saying that coherence theorems are the justification for his claim about the anti-naturalness of deference.
If coherence theorems are consistent with deference being “natural”, then I’m not sure what argument Eliezer is trying to make in this post, because then couldn’t they also be consistent with other deontological cognition being natural, and therefore likely to arise in AGIs?
effective cognition will generically involve trading off these resources in a way that does not reliably lose them
In principle, maybe. In practice, if we’d been trying to predict how monkeys will evolve, what does this claim imply about human-monkey differences?
It seems to me that there has been enough unanswered criticism of the implications of coherence theorems for making predictions about AGI that it would be quite misleading to include this post in the 2019 review.
In an earlier review, johnswentworth argues:
I think this is a reasonable point, but also a very different type of argument from Eliezer’s argument, since it relies on things like economic incentives. Instead, when Eliezer critiques Paul’s concept of corrigibility, he says things like “deference is an unusually anti-natural shape for cognition”. How do coherence theorems translate to such specific claims about the “shape of cognition”; and why is grounding these theorems in “resources” a justifiable choice in this context? These are the types of follow-up arguments which seem necessary at this point in order for further promotion of this post to be productive rather than harmful.
If the post is the best articulation of a line of reasoning that has been influential in people’s thinking about alignment, then even if there are strong arguments against it, I don’t see why that means the post is not significant, at least from a historical perspective.
By analogy, I think Searle’s Chinese Room argument is wrong and misleading, but I wouldn’t argue that it shouldn’t be included in a list of important works on philosophy of mind.
Would you (assuming you disagreed with it)? If not, what’s the difference here?
(Put another way, I wouldn’t think of the review as a collection of “correct” posts, but rather as a collection of posts that were important contributions to our thinking. To me this certainly qualifies as that.)
Your argument is plausible. On the other hand, this review is for 2019, not 2017 (when this post was written) nor 2013 (when this series of ideas was originally laid out). So it seems like it should reflect our current-ish thinking.
I note that the page for the review doesn’t have anything about voting criteria. This seems like something of an oversight?
Context is important. If you publish something without comment or counterpoint, you’re hinting that it’s to be taken as true.
It occurs to me that one plausible answer here is that cognition requires computational resources, and therefore effective cognition will generically involve trading off these resources in a way that does not reliably lose them.
But my more relevant response is that in that section I don’t see Eliezer saying that coherence theorems are the justification for his claim about the anti-naturalness of deference.
If coherence theorems are consistent with deference being “natural”, then I’m not sure what argument Eliezer is trying to make in this post, because then couldn’t they also be consistent with other deontological cognition being natural, and therefore likely to arise in AGIs?
In principle, maybe. In practice, if we’d been trying to predict how monkeys will evolve, what does this claim imply about human-monkey differences?