Steven Byrnes comments on “The Era of Experience” has an unsolved technical alignment problem

Steven Byrnes 30 Apr 2025 15:18 UTC
2 points
0
(1) Yeah AI self-modification is an important special case of irreversible actions, where I think we both agree that (mis)generalization from the reward history is very important. (2) Yeah I think we both agree that it’s hopeless to come up with a reward function for judging AI behavior as good vs bad, that we can rely on all the way to ASI.