Exorcizing the Speed Prior?

What fol­lows is an ar­gu­ment against find­ing op­ti­miza­tion dae­mons with too much prob­a­bil­ity mass in the speed prior. The ar­gu­ment is very fake, but per­haps it could in­spire real math down the line. The in­tu­ition isn’t so differ­ent from the one in are min­i­mal cir­cuits dae­mon free? -- mostly, the speed prior seems like a harder thing to ar­gue is dae­mon-free.

The speed prior is very similar to a K-com­plex­ity prior, ex­cept where a K-com­plex­ity prior as­signs prob­a­bil­ity pro­por­tional to to a pro­gram, the speed prior adds an ex­tra penalty for ex­e­cu­tion time: . Note that the penalty for time is log­a­r­ith­mic com­pared to the penalty for length; think­ing twice as long is pe­nal­ized the same as be­ing one bit longer.

Op­ti­miza­tion dae­mons arise when we are search­ing a rich space to try and solve a prob­lem which we don’t oth­er­wise know how to solve. This makes it seem like the worst-case sce­nario for dae­mons would be if in­tel­li­gence is re­ally fun­da­men­tally about search—then “try to be in­tel­li­gent and dae­mon-free” is a non-starter.

(There are more nu­anced pos­si­bil­ities, like “in­tel­li­gence is about search, but not nec­es­sar­ily search of “rich spaces” which con­tain dae­mons. We’ll set those aside for this ar­gu­ment.)

So, let’s as­sume that we’re in this worst-case sce­nario. I’m go­ing to make a con­trar­ian ar­gu­ment that this is ac­tu­ally a good thing.

To be more pre­cise: the con­cern with op­ti­miza­tion dae­mons is that by op­ti­miz­ing hard over a rich search space to achieve a difficult task, you might ac­ci­den­tally pick a in­tel­li­gent agent out of the search space, who has goals which are some­what differ­ent from yours. I’m go­ing to as­sume that the only way an agent can be an in­tel­li­gent is ei­ther by perform­ing an ex­pen­sive brute-force search over pos­si­bil­ities, or hav­ing some baked-in knowl­edge which al­lows a smarter search, sav­ing it time ex­po­nen­tial in how much baked-in knowl­edge it has.

The as­sump­tion makes a lit­tle bit of sense, be­cause lin­early many bits can lo­cate some­thing in an ex­po­nen­tially large search space. If I knew more com­plex­ity the­ory, maybe I could say why the as­sump­tion is re­ally bad. But, let’s pro­ceed with the as­sump­tion.

The ar­gu­ment is: an op­ti­miza­tion dae­mon which solves the prob­lem via in­tel­li­gent search can’t be higher prob­a­bil­ity mass than the dae­mon-free solu­tion. Brute-force search gets you a time penalty which is ex­actly the same as the de­scrip­tion length of the thing you would find out by search­ing, so there’s no rea­son to search rather than just have the an­swer. On the other hand, if the dae­mon is do­ing some kind of smart search, my (pos­si­bly crazy) com­plex­ity-the­o­retic as­sump­tion says that the de­scrip­tion com­plex­ity of the smart search trades off ex­actly with what it saves you in search time.

The idea is that we our­selves are do­ing in­tel­li­gent search at the “top level” to solve the prob­lem, but, a solu­tion ex­ists which doesn’t it­self use in­tel­li­gent search—namely, the thing an in­tel­li­gent agent would be find­ing. An in­tel­li­gent agent has prob­a­bil­ity mass at most equal to the non-in­tel­li­gent solu­tion.

(“Op­ti­miza­tion dae­mons are at most equally likely” may be a rather sad con­clu­sion, but what are you go­ing to do?)

Now, ob­vi­ously, many in­ter­est­ing poly-time al­gorithms ex­ist which can op­ti­mize in in­ter­est­ing cases and which may look some­what agen­tic. The (very fake!) as­sump­tion I’m us­ing here is that none of these can be in­tel­li­gent in a con­cern­ing sense. In this view, hu­mans can be­have very in­tel­li­gently in large part be­cause we don’t think via brute-force search—we use “smart search” which has been given a large num­ber of bits of com­pressed ad­vice in the ge­netic code. Essen­tially, we get the ad­van­tage of the pro­cess­ing power of evolu­tion­ary his­tory.

Another rea­son to doubt my as­sump­tions is that, in­tu­itively, it seems like there are a lot of prob­lems which ac­tu­ally re­quire in­tel­li­gence to solve. If you are try­ing to train a mas­sive neu­ral net­work in a very com­plex set­ting, it seems plau­si­ble that the set­ting can be com­plex enough to make you learn a gen­eral con­se­quen­tial­ist rea­soner (rather than a big com­pressed policy which does well).

Per­haps you doubt the premises of my ar­gu­ment but buy the con­nec­tion be­tween premises and con­clu­sion. If some­thing similar to my ar­gu­ment does make any sense, this means that “in­tel­li­gence only comes from search” isn’t the worst sce­nario for dae­mons. The worst-case sce­nario is the case where there are types of in­tel­li­gence which don’t just look like search, but we don’t know what all of them are; this means that the op­ti­miza­tion dae­mons can be smarter than the search al­gorithm in a sig­nifi­cant sense, cre­at­ing pres­sure for our search to find those in­tel­li­gent agents in or­der to solve prob­lems more effi­ciently.