Steven Byrnes comments on “The Era of Experience” has an unsolved technical alignment problem

Steven Byrnes 26 Apr 2025 14:28 UTC
3 points
1
“This problem has a solution (and one that can be realistically implemented)” is another important crux, I think. As I wrote here: “For one thing, we don’t actually know for sure that this technical problem is solvable at all, until we solve it. And if it’s not in fact solvable, then we should not be working on this research program at all. If it’s not solvable, the only possible result of this research program would be “a recipe for summoning demons”, so to speak. And if you’re scientifically curious about what a demon-summoning recipe would look like, then please go find something else to be scientifically curious about instead.”
I have other retorts too, but I’m not sure it’s productive for me to argue against a position that you don’t endorse yourself, but rather are imagining that someone else has. We can see if anyone shows up here who actually endorses something like that.
Anyway, if Silver were to reply “oops, yeah, the reward function plan that I described doesn’t work, in the future I’ll say it’s an unsolved problem”, then that would be a big step in the right direction. It wouldn’t be remotely sufficient, but it would be a big step in the right direction, and worth celebrating.