# Exorcizing the Speed Prior?

What fol­lows is an ar­gu­ment against find­ing op­ti­miza­tion dae­mons with too much prob­a­bil­ity mass in the speed prior. The ar­gu­ment is very fake, but per­haps it could in­spire real math down the line. The in­tu­ition isn’t so differ­ent from the one in are min­i­mal cir­cuits dae­mon free? -- mostly, the speed prior seems like a harder thing to ar­gue is dae­mon-free.

The speed prior is very similar to a K-com­plex­ity prior, ex­cept where a K-com­plex­ity prior as­signs prob­a­bil­ity pro­por­tional to to a pro­gram, the speed prior adds an ex­tra penalty for ex­e­cu­tion time: . Note that the penalty for time is log­a­r­ith­mic com­pared to the penalty for length; think­ing twice as long is pe­nal­ized the same as be­ing one bit longer.

Op­ti­miza­tion dae­mons arise when we are search­ing a rich space to try and solve a prob­lem which we don’t oth­er­wise know how to solve. This makes it seem like the worst-case sce­nario for dae­mons would be if in­tel­li­gence is re­ally fun­da­men­tally about search—then “try to be in­tel­li­gent and dae­mon-free” is a non-starter.

(There are more nu­anced pos­si­bil­ities, like “in­tel­li­gence is about search, but not nec­es­sar­ily search of “rich spaces” which con­tain dae­mons. We’ll set those aside for this ar­gu­ment.)

So, let’s as­sume that we’re in this worst-case sce­nario. I’m go­ing to make a con­trar­ian ar­gu­ment that this is ac­tu­ally a good thing.

To be more pre­cise: the con­cern with op­ti­miza­tion dae­mons is that by op­ti­miz­ing hard over a rich search space to achieve a difficult task, you might ac­ci­den­tally pick a in­tel­li­gent agent out of the search space, who has goals which are some­what differ­ent from yours. I’m go­ing to as­sume that the only way an agent can be an in­tel­li­gent is ei­ther by perform­ing an ex­pen­sive brute-force search over pos­si­bil­ities, or hav­ing some baked-in knowl­edge which al­lows a smarter search, sav­ing it time ex­po­nen­tial in how much baked-in knowl­edge it has.

The as­sump­tion makes a lit­tle bit of sense, be­cause lin­early many bits can lo­cate some­thing in an ex­po­nen­tially large search space. If I knew more com­plex­ity the­ory, maybe I could say why the as­sump­tion is re­ally bad. But, let’s pro­ceed with the as­sump­tion.

The ar­gu­ment is: an op­ti­miza­tion dae­mon which solves the prob­lem via in­tel­li­gent search can’t be higher prob­a­bil­ity mass than the dae­mon-free solu­tion. Brute-force search gets you a time penalty which is ex­actly the same as the de­scrip­tion length of the thing you would find out by search­ing, so there’s no rea­son to search rather than just have the an­swer. On the other hand, if the dae­mon is do­ing some kind of smart search, my (pos­si­bly crazy) com­plex­ity-the­o­retic as­sump­tion says that the de­scrip­tion com­plex­ity of the smart search trades off ex­actly with what it saves you in search time.

The idea is that we our­selves are do­ing in­tel­li­gent search at the “top level” to solve the prob­lem, but, a solu­tion ex­ists which doesn’t it­self use in­tel­li­gent search—namely, the thing an in­tel­li­gent agent would be find­ing. An in­tel­li­gent agent has prob­a­bil­ity mass at most equal to the non-in­tel­li­gent solu­tion.

(“Op­ti­miza­tion dae­mons are at most equally likely” may be a rather sad con­clu­sion, but what are you go­ing to do?)

Now, ob­vi­ously, many in­ter­est­ing poly-time al­gorithms ex­ist which can op­ti­mize in in­ter­est­ing cases and which may look some­what agen­tic. The (very fake!) as­sump­tion I’m us­ing here is that none of these can be in­tel­li­gent in a con­cern­ing sense. In this view, hu­mans can be­have very in­tel­li­gently in large part be­cause we don’t think via brute-force search—we use “smart search” which has been given a large num­ber of bits of com­pressed ad­vice in the ge­netic code. Essen­tially, we get the ad­van­tage of the pro­cess­ing power of evolu­tion­ary his­tory.

Another rea­son to doubt my as­sump­tions is that, in­tu­itively, it seems like there are a lot of prob­lems which ac­tu­ally re­quire in­tel­li­gence to solve. If you are try­ing to train a mas­sive neu­ral net­work in a very com­plex set­ting, it seems plau­si­ble that the set­ting can be com­plex enough to make you learn a gen­eral con­se­quen­tial­ist rea­soner (rather than a big com­pressed policy which does well).

Per­haps you doubt the premises of my ar­gu­ment but buy the con­nec­tion be­tween premises and con­clu­sion. If some­thing similar to my ar­gu­ment does make any sense, this means that “in­tel­li­gence only comes from search” isn’t the worst sce­nario for dae­mons. The worst-case sce­nario is the case where there are types of in­tel­li­gence which don’t just look like search, but we don’t know what all of them are; this means that the op­ti­miza­tion dae­mons can be smarter than the search al­gorithm in a sig­nifi­cant sense, cre­at­ing pres­sure for our search to find those in­tel­li­gent agents in or­der to solve prob­lems more effi­ciently.

• The worst-case sce­nario is the case where there are types of in­tel­li­gence which don’t just look like search, but we don’t know what all of them are; this means that the op­ti­miza­tion dae­mons can be smarter than the search al­gorithm in a sig­nifi­cant sense, cre­at­ing pres­sure for our search to find those in­tel­li­gent agents in or­der to solve prob­lems more effi­ciently.

This is ex­actly what seems most likely to me (see this post).

I think the as­sump­tion that all in­tel­li­gence is search is im­plau­si­ble. For one, cur­rent AI sys­tems use meth­ods other than brute force search (e.g. Bayesian in­fer­ence, gra­di­ent de­scent, MCTS, logic, Q-learn­ing). For an­other, it seems that there are use­ful generic cog­ni­tive al­gorithms (e.g. multi-level mod­els, math­e­mat­i­cal rea­son­ing, var­i­ous Bayesian mod­els, var­i­ous fre­quen­tist meth­ods such as con­fi­dence in­ter­vals) that gen­er­al­ize pretty well across differ­ent do­mains, and are not en­tirely made of search.

• Cur­rent AI does stochas­tic search, but it is still search. Essen­tially PP com­plex­ity class, in­stead of NP/​P. (with a fair amount of do­main spe­cific heuris­tics)

• Not all AI is search, and when it is it’s usu­ally not brute force search.

• I un­der­stand your search based in­tel­li­gence as a sim­ple brute force search over all the pos­si­bil­ities, ex­cept that the agent already knows a bit­string from the start of the solu­tion, so only has to con­sider pos­si­bil­ities start­ing with those bits. This al­gorithm, once given its util­ity func­tion, can’t make gains by trad­ing off time and space.

The hy­poth­e­sis that hu­mans have a large enough amount of ad­vice built into their ge­netic code is al­most cer­tainly wrong. The hu­man genome con­tains about 4 billion bits. Most of these bits are used to de­scribe low level biol­ogy, not brain func­tion. Its also un­clear how evolu­tion got ac­cess to ad­vice bits fro space­craft de­sign. But even ig­nor­ing those points, imag­ine a 20GB pile of data pro­duced by hu­mans. While not perfectly op­ti­mized, the data will al­most cer­tainly be more than 50% op­ti­mized. (If you set ev­ery other let­ter at ran­dom, there is liter­ally no way to fill in the re­main­ing let­ters to make a good book.) If you doubt this, use a big­ger pile of data, and a lower thresh­old. Take a 1Tb pile of text, if you set 99% of a books char­ac­ters at ran­dom, there is no way to fill in the re­main­ing 1% to make sense. This shows that hu­mans can out­put more than 10Gb of op­ti­miza­tion pres­sure. Be­ing gen­er­ous and count­ing each neu­ron firing, the run­time of hu­man­ity is around 10^36. This would put a hard limit of 4 billion +log_2(10^36) bits of op­ti­miza­tion pres­sure that hu­man­ity could ex­ert.

The hy­poth­e­sis that hu­mans smart search is based on our ge­netic code con­tain­ing ad­vice in this sense is clearly false.

• You could think of the ‘ad­vice’ given by evolu­tion be­ing in the form of a short pro­gram, e.g. for a neu­ral-net-like learn­ing al­gorithm. In this case, a rel­a­tively short string of ad­vice could re­sult in a lot of ap­par­ent op­ti­miza­tion.

(For the book ex­am­ple: imag­ine a species that out­puts books of 20Gb con­tain­ing only the let­ter ‘a’. This is very un­likely to be pro­duced by ran­dom choice, yet it can be speci­fied with only a few bits of ‘ad­vice’)

• Con­sider a sys­tem of lin­ear equa­tions over . Brute force search takes time ex­po­nen­tial in the di­men­sion. Gaus­sian elimi­na­tion takes time polyno­mial in the di­men­sion, and its de­scrip­tion length is . So your hy­poth­e­sis clearly doesn’t work here. Now, it sounds more plau­si­ble if you as­sume your search prob­lem is NP-hard. The ques­tion is whether this is a good model of in­tel­li­gence. If this is a good model then it means that any in­tel­li­gent agent will have enor­mous de­scrip­tion com­plex­ity, and there is no bet­ter way of con­struct­ing one than do­ing brute force search. This prob­a­bly im­plies a very long AGI timeline. How­ever, I think it’s more likely that this is a bad model.