On the purposes of decision theory research

Fol­low­ing the ex­am­ples of Rob Bens­inger and Ro­hin Shah, this post will try to clar­ify the aims of part of my re­search in­ter­ests, and dis­claim some pos­si­ble mi­s­un­der­stand­ings about it. (I’m ob­vi­ously only speak­ing for my­self and not for any­one else do­ing de­ci­sion the­ory re­search.)

I think de­ci­sion the­ory re­search is use­ful for:

  1. Gain­ing in­for­ma­tion about the na­ture of ra­tio­nal­ity (e.g., is “re­al­ism about ra­tio­nal­ity” true?) and the na­ture of philos­o­phy (e.g., is it pos­si­ble to make real progress in de­ci­sion the­ory, and if so what cog­ni­tive pro­cesses are we us­ing to do that?), and helping to solve the prob­lems of nor­ma­tivity, meta-ethics, and metaphilos­o­phy.

  2. Bet­ter un­der­stand­ing po­ten­tial AI safety failure modes that are due to flawed de­ci­sion pro­ce­dures im­ple­mented in or by AI.

  3. Mak­ing progress on var­i­ous seem­ingly im­por­tant in­tel­lec­tual puz­zles that seem di­rectly re­lated to de­ci­sion the­ory, such as free will, an­thropic rea­son­ing, log­i­cal un­cer­tainty, Rob’s ex­am­ples of coun­ter­fac­tu­als, up­date­less­ness, and co­or­di­na­tion, and more.

  4. Firm­ing up the foun­da­tions of hu­man ra­tio­nal­ity.

To me, de­ci­sion the­ory re­search is not meant to:

  1. Provide a cor­rect or nor­ma­tive de­ci­sion the­ory that will be used as a speci­fi­ca­tion or ap­prox­i­ma­tion tar­get for pro­gram­ming or train­ing a po­ten­tially su­per­in­tel­li­gent AI.

  2. Help cre­ate “safety ar­gu­ments” that aim to show that a pro­posed or already ex­ist­ing AI is free from de­ci­sion the­o­retic flaws.

To help ex­plain 5 and 6, here’s what I wrote in a pre­vi­ous com­ment (slightly ed­ited):

One meta level above what even UDT tries to be is de­ci­sion the­ory (as a philo­soph­i­cal sub­ject) and one level above that is metaphilos­o­phy, and my cur­rent think­ing is that it seems bad (po­ten­tially dan­ger­ous or re­gret­ful) to put any sig­nifi­cant (i.e., su­per­hu­man) amount of com­pu­ta­tion into any­thing ex­cept do­ing philos­o­phy.

To put it an­other way, any de­ci­sion the­ory that we come up with might have some kind of flaw that other agents can ex­ploit, or just a flaw in gen­eral, such as in how well it co­op­er­ates or ne­go­ti­ates with or ex­ploits other agents (which might in­clude how quickly/​clev­erly it can make the nec­es­sary com­mit­ments). Wouldn’t it be bet­ter to put com­pu­ta­tion into try­ing to find and fix such flaws (in other words, com­ing up with bet­ter de­ci­sion the­o­ries) than into any par­tic­u­lar ob­ject-level de­ci­sion the­ory, at least un­til the su­per­hu­man philo­soph­i­cal com­pu­ta­tion it­self de­cides to start do­ing the lat­ter?

Com­par­ing my cur­rent post to Rob’s post on the same gen­eral topic, my men­tions of 1, 2, and 4 above seem to be new, and he didn’t seem to share (or didn’t choose to em­pha­size) my con­cern that de­ci­sion the­ory re­search (as done by hu­mans in the fore­see­able fu­ture) can’t solve de­ci­sion the­ory in a defini­tive enough way that would ob­vi­ate the need to make sure that any po­ten­tially su­per­in­tel­li­gent AI can find and fix de­ci­sion the­o­retic flaws in it­self.