Asymptotically Benign AGI

We pre­sent an al­gorithm, then show (given four as­sump­tions) that in the limit, it is hu­man-level in­tel­li­gent and be­nign.

Will MacAskill has com­mented that in the sem­i­nar room, he is a con­se­quen­tial­ist, but for de­ci­sion-mak­ing, he takes se­ri­ously the lack of a philo­soph­i­cal con­sen­sus. I be­lieve that what is here is cor­rect, but in the ab­sence of feed­back from the Align­ment Fo­rum, I don’t yet feel com­fortable post­ing it to a place (like arXiv) where it can get cited and en­ter the aca­demic record. We have sub­mit­ted it to IJCAI, but we can edit or re­voke it be­fore it is printed.

I will dis­tribute at least min($365, num­ber of com­ments * $15) in prizes by April 1st (via venmo if pos­si­ble, or else Ama­zon gift cards, or a dona­tion on their be­half if they pre­fer) to the au­thors of the com­ments here, ac­cord­ing to the com­ments’ qual­ity. If one com­menter finds an er­ror, and an­other com­menter tin­kers with the setup or tin­kers with the as­sump­tions in or­der to cor­rect it, then I ex­pect both com­ments will re­ceive a similar prize (if those com­ments are at the level of prize-win­ning, and nei­ther per­son is me). If oth­ers would like to donate to the prize pool, I’ll provide a com­ment that you can re­ply to.

To or­ga­nize the con­ver­sa­tion, I’ll start some com­ment threads be­low:

  • Pos­i­tive feedback

  • Gen­eral Con­cerns/​Confusions

  • Minor Concerns

  • Con­cerns with As­sump­tion 1

  • Con­cerns with As­sump­tion 2

  • Con­cerns with As­sump­tion 3

  • Con­cerns with As­sump­tion 4

  • Con­cerns with “the box”

  • Ad­ding to the prize pool