awenonian comments on Late 2021 MIRI Conversations: AMA / Discussion

awenonian 13 Mar 2022 3:07 UTC
3 points
In my post I wrote:
Am I correct after reading this that this post is heavily related to embedded agency? I may have misunderstood the general attitudes, but I thought of “future states” as “future to now” not “future to my action.” It seems like you couldn’t possibly create a thing that works on the last one, unless you intend it to set everything in motion and then terminate. In the embedded agency sequence, they point out that embedded agents don’t have well defined i/o channels. One way is that “action” is not a well defined term, and is often not atomic.
It also sounds like you’re trying to suggest that we should be judging trajectories, not states? I just want to note that this is, as far as I can tell, the plan: https://www.lesswrong.com/posts/K4aGvLnHvYgX9pZHS/the-fun-theory-sequence
Life’s utility function is over 4D trajectories, not just 3D outcomes.
From the synopsis of High Challenge
instead of making a corrigible AGI.
I’m not sure I interpret corrigibility as exactly the same as “preferring the humans remain in control” (I see you suggest this yourself in Objection 1, I wrote this before I reread that, but I’m going to leave it as is) and if you programmed that preference into a non-corrigible AI, it would still seize the future into states where the humans have to remain in control. Better than doom, but not ideal if we can avoid it with actual corrigibility.
But I think I miscommunicated, because, besides the above, I agree with everything else in those two paragraphs.
See discussion under “Objection 1” in my post.
I think I maintain that this feels like it doesn’t solve much. Much of the discussion in the Yudkowsky conversations was that there’s a concern on how to point powerful systems in any direction. Your response to objection 1 admits you don’t claim this solves that, but that’s most of the problem. If we do solve the problem of how to point a system at some abstract concept, why would we choose “the humans remain in control” and not “pursue humanity’s CEV”? Do you expect “the humans remain in control” (or the combination of concepts you propose as an alternative) to be easier to define? Easier enough to define that it’s worth pursuing even if we might choose the wrong combination of concepts? Do you expect something else?

awenonian comments on Late 2021 MIRI Conversations: AMA /​ Discussion

awenonian comments on Late 2021 MIRI Conversations: AMA / Discussion