Hm, upon more thought I actually kind of endorse this as a demo. I think we should be able to run an alignment scheme on c.
elegans and get out a universe full of well-fed worms, and that’s a decent sign that we didn’t screw up, despite the fact that it doesn’t engage with several key problems that arise in humans because we’re more complicated, have preferences about pur preferences, etc. No weird worm-stimulation should be needed. But we do have to accept that we’re not getting some notion of values independent of an act of interpretation.
Hm, upon more thought I actually kind of endorse this as a demo. I think we should be able to run an alignment scheme on c. elegans and get out a universe full of well-fed worms, and that’s a decent sign that we didn’t screw up, despite the fact that it doesn’t engage with several key problems that arise in humans because we’re more complicated, have preferences about pur preferences, etc. No weird worm-stimulation should be needed. But we do have to accept that we’re not getting some notion of values independent of an act of interpretation.