VojtaKovarik comments on The self-unalignment problem

VojtaKovarik 19 Apr 2023 16:19 UTC
LW: 10 AF: 5
2
AF
This post seems related to an exaggerated version of what I believe: Humans are so far from “agents trying to maximize utility” that to understand how to AI to humans, we should first understand what it means to align AI to finite-state machines. (Doesn’t mean it’s sufficient to understand the latter. Just that it’s a prerequisite.)
As I wrote, going all the way to finite-state machines seems exaggerated, even as a starting point. However, it does seem to me that starting somewhere on that end of the agent<>rock spectrum is the better way to go about understanding human values :-). (At least given what we already know.)