Asymptotically Unambitious AGI

Edit 30/​5/​19: An up­dated ver­sion is on arXiv. I now feel com­fortable with it be­ing cited. The key changes:

  • The Ti­tle. I sus­pect the agent is un­am­bi­tious for its en­tire life­time, but the ti­tle says “asymp­tot­i­cally” be­cause that’s what I’ve shown for­mally. In­deed, I sus­pect the agent is be­nign for its en­tire life­time, but the ti­tle says “un­am­bi­tious” be­cause that’s what I’ve shown for­mally. (See the sec­tion “Con­cerns with Task-Com­ple­tion” for an in­for­mal ar­gu­ment go­ing from un­am­bi­tious → be­nign).

  • The Use­less Com­pu­ta­tion As­sump­tion. I’ve made it a slightly stronger as­sump­tion. The origi­nal ver­sion is tech­ni­cally cor­rect, but set­ting is tricky if the weak ver­sion of the as­sump­tion is true but the strong ver­sion isn’t. This stronger as­sump­tion also sim­plifies the ar­gu­ment.

  • The Prior. Rather than hav­ing to do with the de­scrip­tion length of the Tur­ing ma­chine simu­lat­ing the en­vi­ron­ment, it has to do with the num­ber of states in the Tur­ing ma­chine. This was in re­sponse to Paul’s point that the finite-time be­hav­ior of the origi­nal ver­sion is re­ally weird. This also makes the Nat­u­ral Prior As­sump­tion (now called the No Grue As­sump­tion) a bit eas­ier to as­sess.

Origi­nal Post:

We pre­sent an al­gorithm, then show (given four as­sump­tions) that in the limit, it is hu­man-level in­tel­li­gent and be­nign.

Will MacAskill has com­mented that in the sem­i­nar room, he is a con­se­quen­tial­ist, but for de­ci­sion-mak­ing, he takes se­ri­ously the lack of a philo­soph­i­cal con­sen­sus. I be­lieve that what is here is cor­rect, but in the ab­sence of feed­back from the Align­ment Fo­rum, I don’t yet feel com­fortable post­ing it to a place (like arXiv) where it can get cited and en­ter the aca­demic record. We have sub­mit­ted it to IJCAI, but we can edit or re­voke it be­fore it is printed.

I will dis­tribute at least min($365, num­ber of com­ments * $15) in prizes by April 1st (via venmo if pos­si­ble, or else Ama­zon gift cards, or a dona­tion on their be­half if they pre­fer) to the au­thors of the com­ments here, ac­cord­ing to the com­ments’ qual­ity. If one com­menter finds an er­ror, and an­other com­menter tin­kers with the setup or tin­kers with the as­sump­tions in or­der to cor­rect it, then I ex­pect both com­ments will re­ceive a similar prize (if those com­ments are at the level of prize-win­ning, and nei­ther per­son is me). If oth­ers would like to donate to the prize pool, I’ll provide a com­ment that you can re­ply to.

To or­ga­nize the con­ver­sa­tion, I’ll start some com­ment threads be­low:

  • Pos­i­tive feedback

  • Gen­eral Con­cerns/​Confusions

  • Minor Concerns

  • Con­cerns with As­sump­tion 1

  • Con­cerns with As­sump­tion 2

  • Con­cerns with As­sump­tion 3

  • Con­cerns with As­sump­tion 4

  • Con­cerns with “the box”

  • Ad­ding to the prize pool