Asymptotically Benign AGI
We present an algorithm, then show (given four assumptions) that in the limit, it is human-level intelligent and benign.
Will MacAskill has commented that in the seminar room, he is a consequentialist, but for decision-making, he takes seriously the lack of a philosophical consensus. I believe that what is here is correct, but in the absence of feedback from the Alignment Forum, I don’t yet feel comfortable posting it to a place (like arXiv) where it can get cited and enter the academic record. We have submitted it to IJCAI, but we can edit or revoke it before it is printed.
I will distribute at least min($365, number of comments * $15) in prizes by April 1st (via venmo if possible, or else Amazon gift cards, or a donation on their behalf if they prefer) to the authors of the comments here, according to the comments’ quality. If one commenter finds an error, and another commenter tinkers with the setup or tinkers with the assumptions in order to correct it, then I expect both comments will receive a similar prize (if those comments are at the level of prize-winning, and neither person is me). If others would like to donate to the prize pool, I’ll provide a comment that you can reply to.
To organize the conversation, I’ll start some comment threads below:
Concerns with Assumption 1
Concerns with Assumption 2
Concerns with Assumption 3
Concerns with Assumption 4
Concerns with “the box”
Adding to the prize pool