We do not know very well how the human mind does anything at all. But that the the human mind comes to have preferences that it did not have initially, cannot be doubted.
I believe Eliezer is trying to create “fully recursive self-modifying agents that retain stable preferences while rewriting their source code”. Like Sebastian says, getting the “stable preferences” bit right is presumably necessary for Friendly AI, as Eliezer sees it.
(This clause “as Eliezer sees it” isn’t meant to indicate dissent, but merely my total incompetence to judge whether this condition is strictly necessary for friendly AI.)
We do not know very well how the human mind does anything at all. But that the the human mind comes to have preferences that it did not have initially, cannot be doubted.
I believe Eliezer is trying to create “fully recursive self-modifying agents that retain stable preferences while rewriting their source code”. Like Sebastian says, getting the “stable preferences” bit right is presumably necessary for Friendly AI, as Eliezer sees it.
(This clause “as Eliezer sees it” isn’t meant to indicate dissent, but merely my total incompetence to judge whether this condition is strictly necessary for friendly AI.)