Open question: are minimal circuits daemon-free?

Note: weird stuff, very in­for­mal.

Sup­pose I search for an al­gorithm that has made good pre­dic­tions in the past, and use that al­gorithm to make pre­dic­tions in the fu­ture.

I may get a “dae­mon,” a con­se­quen­tial­ist who hap­pens to be mo­ti­vated to make good pre­dic­tions (per­haps be­cause it has re­al­ized that only good pre­dic­tors sur­vive). Un­der differ­ent con­di­tions, the dae­mon may no longer be mo­ti­vated to pre­dict well, and may in­stead make “pre­dic­tions” that help it achieve its goals at my ex­pense.

I don’t know whether this is a real prob­lem or not. But from a the­o­ret­i­cal per­spec­tive, not know­ing is already con­cern­ing—I’m try­ing to find a strong ar­gu­ment that we’ve solved al­ign­ment, not just some­thing that seems to work in prac­tice.

I am pretty con­vinced that dae­mons are a real prob­lem for Solomonoff in­duc­tion. In­tu­itively, the prob­lem is caused by “too much com­pute.” I sus­pect that dae­mons are also a prob­lem for some more re­al­is­tic learn­ing pro­ce­dures (like hu­man evolu­tion), though in a differ­ent shape. I think that this prob­lem can prob­a­bly be patched, but that’s one of the ma­jor open ques­tions for the fea­si­bil­ity of pro­saic AGI al­ign­ment.

I sus­pect that dae­mons aren’t a prob­lem if we ex­clu­sively se­lect for com­pu­ta­tional effi­ciency. That is, I sus­pect that the fastest way to solve any par­tic­u­lar prob­lem doesn’t in­volve dae­mons.

I don’t think this ques­tion has much in­trin­sic im­por­tance, be­cause al­most all re­al­is­tic learn­ing pro­ce­dures in­volve a strong sim­plic­ity prior (e.g. weight shar­ing in neu­ral net­works).

But I do think this ques­tion has deep similar­i­ties to more im­por­tant prob­lems, and that an­swer­ing this ques­tion will in­volve de­vel­op­ing use­ful con­cep­tual ma­chin­ery. Be­cause we have an un­usu­ally strong in­tu­itive han­dle on the prob­lem, I think it’s a good thing to think about.

Prob­lem state­ment and intuition

Can the small­est boolean cir­cuit that solves a prob­lem be a dae­mon? For ex­am­ple, can the small­est cir­cuit that pre­dicts my be­hav­ior (at some level of ac­cu­racy) be a dae­mon?

In­tu­itively, if we have a dae­mon that is in­stru­men­tally or in­ci­den­tally mo­ti­vated to solve my prob­lem, then there is some smaller cir­cuit that solves the prob­lem equally well but skips the in­stru­men­tal rea­son­ing. If my dae­mon is do­ing some com­plex rea­son­ing to an­swer the ques­tion “Should I pre­dict well?” we could just skip straight to the an­swer “yes.” This both makes the cir­cuit smaller, and pre­vents the cir­cuit from ever de­cid­ing not to pre­dict well.

A differ­ent per­spec­tive on a similar in­tu­ition: the dae­mon is do­ing some ac­tual cog­ni­tive work to solve the prob­lem. Since that com­pu­ta­tion is be­ing done by the dae­mon, it is em­bed­ded as a smaller cir­cuit. Jes­sica ex­plores this in­tu­ition a bit here. Here we are con­sid­er­ing an easy ver­sion of the prob­lem, since by tak­ing the small­est cir­cuit we are effec­tively quan­tify­ing over all pos­si­ble ways of ex­tract­ing log­i­cal in­for­ma­tion from the dae­mon.

In­stead of show­ing that min­i­mal cir­cuits can’t be dae­mons, we might end up con­clud­ing that they can be. That would be even more in­ter­est­ing.

Another pos­si­ble out­come is giv­ing a strong ar­gu­ment that cap­tures our in­tu­itions/​con­cerns about dae­mons, and which clearly doesn’t ap­ply to the min­i­mal cir­cuit that solves a prob­lem. In this case we couldn’t prove any­thing pos­i­tive about the min­i­mal cir­cuit, but we would have “screened off” the pos­si­ble cause for con­cern.

Difficulties

The first and most se­ri­ous difficulty is un­der­stand­ing what we are talk­ing about.

I don’t ex­pect to get to­tal clar­ity on con­cepts like “dae­mon” or “op­ti­miza­tion” or “generic prob­lem,” but we need to have a bet­ter grip than we do right now. I ex­pect that we’ll de­velop bet­ter con­cepts in the course of solv­ing the prob­lem, rather than as a pre­con­di­tion for solv­ing the prob­lem (in gen­eral I think “define things so that you can prove the the­o­rem” is of­ten the right strat­egy).

A sec­ond difficulty is that the differ­ent parts of the com­pu­ta­tion can be tan­gled up in an ex­tremely com­plex way. In an ex­treme case, the dae­mon may be cryp­to­graph­i­cally obfus­cated.

We want to show that given any dae­mon, there is a smaller cir­cuit that solves the prob­lem. The most nat­u­ral ap­proach is show­ing how to con­struct a smaller cir­cuit, given a dae­mon. But if the dae­mon is obfus­cated, there is no effi­cient pro­ce­dure which takes the dae­mon cir­cuit as in­put and pro­duces a smaller cir­cuit that still solves the prob­lem.

So we can’t find any effi­cient con­struc­tive ar­gu­ment. That rules out most of the ob­vi­ous strate­gies.