“Everyone” is misinterpreting the implications of the original “no-free-lunch theorems”. Stuart Armstrong is misinterpreting the implications of his no-free-lunch theorems for value learning.
The original no-free-lunch theorems show, that if you use a terrible prior over your hypothesis, then it will not converge/learning is impossible. In practice, this is not important, because we always make the assumption that learning is possible. We call these priors “simplicity priors”, but the actually important bit about these is not the simplicity, but that they “work”(that they converge). Now, Stuart says that using a simplicity prior is not enough to make value learning converge, and thus we must use additional assumptions (taken from human knowledge). Maybe I am reading into it, but the implied problem is telling what kind of assumptions might just be a product of our cultural upbringing or our evolutionary tuning, and it would be hard to say what to judge these values by. But this is wrong! I think the value assumptions shouldn’t have a different status from the “simplicity prior” ones. If his theorem should also apply to humans, then humans would never have been able to learn values. What we must include in our prior is not some knowledge that humans acquired over their environment, but a kind of prior that humans already have before they get any inputs!
Epistemic status: Speculation
“Everyone” is misinterpreting the implications of the original “no-free-lunch theorems”. Stuart Armstrong is misinterpreting the implications of his no-free-lunch theorems for value learning.
The original no-free-lunch theorems show, that if you use a terrible prior over your hypothesis, then it will not converge/learning is impossible. In practice, this is not important, because we always make the assumption that learning is possible. We call these priors “simplicity priors”, but the actually important bit about these is not the simplicity, but that they “work”(that they converge). Now, Stuart says that using a simplicity prior is not enough to make value learning converge, and thus we must use additional assumptions (taken from human knowledge). Maybe I am reading into it, but the implied problem is telling what kind of assumptions might just be a product of our cultural upbringing or our evolutionary tuning, and it would be hard to say what to judge these values by. But this is wrong! I think the value assumptions shouldn’t have a different status from the “simplicity prior” ones. If his theorem should also apply to humans, then humans would never have been able to learn values. What we must include in our prior is not some knowledge that humans acquired over their environment, but a kind of prior that humans already have before they get any inputs!