# Automatic programming, an example

Say, that we have the fol­low­ing ob­ser­va­tional data:

 Planet Aphe­lion000 km Per­ihe­lion 000 km Or­bit time days Mer­cury 69,816 46,001 88 Venus 108,942 107,476 225 Earth 152,098 147,098 365 Mars 249,209 206,669 687 Jupiter 816,520 740,573 4,332 Saturn 1,513,325 1,353,572 10,760 Uranus 3,004,419 2,748,938 30,799 Nep­tune 4,553,946 4,452,940 60,190 Pluto 7,311,000 4,437,000 90,613

The min­i­mal, the max­i­mal dis­tance be­tween a planet and the Sun (both in thou­sands of kilo­me­tres) and the num­ber of (Earth) days for one rev­olu­tion around the Sun. Above is only the em­piri­cal data and no bind­ing al­gorithm among the three quan­tities. The ce­les­tial me­chan­ics rules which go by the name of the Ke­pler’s laws. Can those rules be (re)in­vented by a com­puter pro­gram and how?

The fol­low­ing pro­gram code will be put into a simu­la­tor:

`//​dec­la­ra­tions of the in­te­ger type vari­ables\$DECLAREINT bad per­ihe­lion aphe­lion or­bit guess dif temp zero temp1 //​table with the known data in a simu­la­tor friendly for­mat\$INVAR per­ihe­lion(46001) aphe­lion(69816) or­bit(88)\$INVAR per­ihe­lion(107476) aphe­lion(108942) or­bit(225)\$INVAR per­ihe­lion(147098) aphe­lion(152098) or­bit(365)\$INVAR per­ihe­lion(206669) aphe­lion(249209) or­bit(687)\$INVAR per­ihe­lion(740573) aphe­lion(816520) or­bit(4332)\$INVAR per­ihe­lion(1353572) aphe­lion(1513325) or­bit(10760)\$INVAR per­ihe­lion(2748938) aphe­lion(3004419) or­bit(30799)\$INVAR per­ihe­lion(4452940) aphe­lion(4553946) or­bit(60190)\$INVAR per­ihe­lion(4437000) aphe­lion(7311000) or­bit(90613)`
`//​ vari­ables or­bit and bad can’t be touched by the simu­la­tor//​to avoid a de­gen­er­a­tion to a triv­ial­ity\$RESVAR or­bit bad `
`//​do NOT use if clause, while clause do not set di­rect num­bers …\$RESCOM if while val_op­er­a­tion inc_dec`
`//​bad is the vari­able, by which the whole pro­gram will be judged//​a big value of bad is bad. By this crite­ria pro­grams will be wiped out//​from their vir­tual ex­is­tence. A kind of anti-fit­ness\$PENVAL bad `
`//​do show the fol­low­ing vari­ables when simu­lat­ing\$SHOWVAR bad,or­bit,guess,dif `
`//​pe­nal­ize any com­mand with 0 (noth­ing) and ev­ery line by 1 point\$WEIGHTS com­mands=0 lines=1`
`//​min­i­mize the whole pro­gram to 20 lines or less\$MINIMIZE lines 20 `
```\$BES //​the arena, where al­gorithms will be//​cre­ated and the fittest only will sur­vive
```
`\$EES`
`//​test­ing area where the simu­la­tor has no write ac­cess to//​here the bad (the pe­nal­ized vari­able) is calcu­lated//​big­ger the differ­ence be­tween the known or­bit and the vari­able guess//​worse is the evolved al­gorith­mdif=or­bit-guess;dif=abs(dif);bad=dif;temp=dif;temp*=10000;temp1=temp/​or­bit;temp=temp1*temp1;bad=bad+temp;//​end of the test­ing area`

After sev­eral hours the fol­low­ing C code has been evolved in­side of the \$BES - \$EES seg­ment.

`aphe­lion=per­ihe­lion+aphe­lion;aphe­lion=aphe­lion+aphe­lion;aphe­lion=aphe­lion+aphe­lion;guess=12;aphe­lion=aphe­lion>>guess;temp=aphe­lion/​guess;aphe­lion=aphe­lion-temp;dif=sqrt(aphe­lion);aphe­lion=guess|aphe­lion;aphe­lion=aphe­lion*dif;aphe­lion=guess^aphe­lion;guess=aphe­lion/​guess;`

What the simu­la­tor does? It bom­bards the arena seg­ment with a ran­dom C com­mands. Usu­ally it then just no­tices a syn­tax er­ror and re­pairs ev­ery­thing to the last work­ing ver­sion. If ev­ery­thing is syn­tac­ti­cally good, the simu­la­tor in­ter­prets the pro­gram and checks if the mu­tated ver­sion causes any run-time er­ror like di­vi­sion by zero, a mem­ory leak and so on. In the case of such an er­ror it re­turns to the last good ver­sion. Other­wise it checks the vari­able called “bad”, if it is at least as small as it was ever be­fore. In the case it is, a new ver­sion has just been cre­ated and it is stored.

The evolu­tion­ary pres­sure is work­ing to­ward ever bet­ter code, which in­creas­ingly well guesses the or­bit time of nine planets. In this case the “or­bit” vari­able has been un­der the \$RESVAR clause and then the “gues” vari­able has been tested against the “or­bit” vari­able. Had been no “\$RESVAR or­bit” state­ment, a sim­ple “guess=or­bit;” would evolve quickly. Had been no “\$RESVAR bad” state­ment a sim­ple “bad=-1000000;” could de­rail the pro­cess.

Many thou­sands of al­gorithms are born and die ev­ery sec­ond on a stan­dard Win­dows PC in­side this simu­la­tor. Million or billion gen­er­a­tions later, the digi­tal evolu­tion is still run­ning, even if an ex­cel­lent solu­tion has been already found.

And how good ap­prox­i­ma­tion for the Ke­pler (New­ton) ce­les­tial me­chan­ics of the So­lar sys­tem we have here?

This good for the nine planets where the code evolved:

 Planet Er­ror % Mer­cury 0.00 Venus 0.44 Earth 0.27 Mars 0.29 Jupiter 0.16 Saturn 0.65 Uranus 0.10 Nep­tune 0.79 Pluto 1.08

And this good for the con­trol group of a comet and six as­ter­oids:

 As­teroid/​Comet Er­ror % Halley 1.05 Hebe 1.37 As­traea 1.99 Juno 3.19 Pal­las 1.66 Vesta 2.49 Ceres 2.02

Could be even much bet­ter af­ter an­other billion gen­er­a­tions and maybe with even more \$INVAR ex­am­ples. Gen­er­ally, you can pick any three columns from any in­te­ger type table you want. And see this way, how they are re­lated al­gorith­mi­cally. Can be more than three columns also.

The name of the simu­la­tor (evolu­a­tor) is Crit­ti­call and it is available at http://​​www.crit­ti­call.com

• The gen­er­ated code is bizarre. I re­fac­tored it as well as I could, and it still doesn’t make much sense:

``````aphe­lion = (aphe­lion + per­ihe­lion) >> 10;
aphe­lion = aphe­lion - (aphe­lion /​ 12);
guess = ( ( (aphe­lion | 12) * (int)sqrt(aphe­lion) ) ^ 12 ) /​ 12;
``````

“To get the or­bit time in days from the aphe­lion and per­ihe­lion in Kkm, first sum them and di­vide by 1024. Then from that, sub­tract one twelfth. Then, to the value, perform a bit­wise OR with 0x0C, mul­ti­ply by the square root, and bit-XOR 0x0C again. Fi­nally, di­vide by 12, and that will give you the num­ber of days.”

• The three prob­lems with the code are that the vari­able names are all lies, there’s a bunch of re­dun­dant rescal­ing which isn’t con­soli­dated be­cause it’s done in in­te­ger math when it should be float­ing point, and there are a cou­ple bits of overfit­ting (bit­wise op­er­a­tors) that don’t be­long. If you con­vert to SSA and wipe out the mis­lead­ing names, you get:

``````a1 = per­ihe­lion+aphe­lion  Real
a2 = a1+a1                  Rescal­ing
a3 = a2+a2                  Rescal­ing
g1 = 12                     Rescal­ing
a4 = a3>>g1                 Rescal­ing
t1 = a4/​g1                  Rescal­ing
a5 = a4-t1                  Rescal­ing
d1 = sqrt(a5)             Real
a6 = g1|a5                  Overfit
a7 = a6*d1                Real
a8 = g1^a7                  Overfit
guess = a8/​g1               Rescal­ing
``````

If you re­place the overfit­ting with pass-throughs (a6=a5, a8=a7), then pre­tend it’s float­ing point so that you can con­soli­date all the rescal­ing into a sin­gle mul­ti­plica­tive con­stant, you get

``````guess = k * (per­ihe­lion+aphe­lion)*sqrt(per­ihe­lion+aphe­lion)
``````

Which is Ke­pler’s third law.

• This re­minds me of the dis­cus­sion from last week of the code that a self-mod­ify­ing AI might pro­duce; I said then that I thought peo­ple were not think­ing Weirdly enough. This is in­deed an ex­am­ple of Weird­ness. Ob­vi­ously no hu­man would come up with such a thing. Yet it works, more or less; al­though the hard­coded num­bers make me sus­pect that its range of ap­pli­ca­bil­ity may not be great. Test­ing it on some num­bers typ­i­cal of, say, so­lar or­bits around the galac­tic core, might pro­duce some sur­pris­ing re­sults.

It would also be in­ter­est­ing to com­pare this for speed, as well as ac­cu­racy, with a Ke­pler’s-law im­ple­men­ta­tion.

• di­vide by 1024

Ac­tu­ally by 4096. And it is a rescal­ing as jim­ran­domh points out.

• Am I crazy? A right shift by 10 is equiv­a­lent to a di­vi­sion by 2^10. 2^10 is 1024..

• No. The posted code has a bit shift right for 12 places. The already op­ti­mized code by wmor­gan has a bit shift for only 10 bits.

The meta­com­mand \$RESCOM if while val_op­er­a­tion inc_dec caused this. Hav­ing two con­stants (10 and 12) would be un­de­sir­able be cause of this “val_op­er­a­tion” and there­fore only the con­stant 12 was used.

• This is the gen­er­ated code seg­ment:

``````aphe­lion=aphe­lion+aphe­lion;
aphe­lion=aphe­lion+aphe­lion;
guess=12;
aphe­lion=aphe­lion>>guess;
``````

Those four lines to­gether amount to a shift 10 bits to the right, i.e., di­vi­sion by 1024.

I think you un­der­stand what’s go­ing in the code. The point of my re­fac­tor­ing was to make some­thing that was hu­man-read­able: some­thing that I could de­scribe in English. And the English for those four lines of code is “di­vide by 1024.” That’s what those four lines do.

• We can mod­ify the above code to:

\$DECLAREINT aphe­lion per­ihe­lion dif guess temp \$RINVAR aphe­lion(1000,100000) per­ihe­lion(1000,100000) \$RETVAR guess if (aphe­lion>guess; temp=aphe­lion/​guess; aphe­lion=aphe­lion-temp; dif=sqrt(aphe­lion); //​aphe­lion=guess|aphe­lion; aphe­lion=aphe­lion*dif; //​aphe­lion=guess^aphe­lion; guess=aphe­lion/​guess; \$EES As you can see, there is no \$RESCOM meta­com­mand and the two “overfit” OR and XOR lines has also been com­mented or neu­tral­ized. Ran­dom aphe­lions and per­ihe­lions are be­tween 1 mil­lion and 100 mil­lion km now. If aphe­lion is greater than per­ihe­lion they are swapped first. The in­ter­me­di­ate vari­able “temp” is then re­set to 0 be­fore the “Ke­pler seg­ment” be­gins. If it wasn’t re­set, the simu­la­tor would sim­ply use it! Si­mu­la­tor comes out with this in the \$BES-\$EES sec­tion:

aphe­lion=per­ihe­lion+aphe­lion; aphe­lion>>=10; guess=12; temp=aphe­lion/​guess; aphe­lion=aphe­lion-temp; per­ihe­lion=sqrt(aphe­lion); per­ihe­lion=per­ihe­lion*aphe­lion; guess=per­ihe­lion/​guess; Less lines, but now two con­stants (10 and 12) (ap­prox­i­mately) scale days with mega-me­ters here, where the Sun is this mas­sive. Ini­tially, it was only the 12 con­stant, which shifted, di­vided and was also a XOR ar­gu­ment to fit some more.

Both codes here, in­side the \$BES-\$EES seg­ments are ex­actly equiv­a­lent re­gard­ing the out­puts and are a dis­tant an­ces­tor-de­scen­dant pair. Many mil­lion gen­er­a­tions apart, but not very much differ­ent.

• We can mod­ify the above code to:

\$DECLAREINT aphe­lion per­ihe­lion dif guess temp \$RINVAR aphe­lion(1000,100000) per­ihe­lion(1000,100000) \$RETVAR guess

if (aphe­lion<per­ihe­lion) { temp=per­ihe­lion; per­ihe­lion=aphe­lion; aphe­lion=temp; }

temp=0;
\$BES aphe­lion=per­ihe­lion+aphe­lion; aphe­lion=aphe­lion+aphe­lion; aphe­lion=aphe­lion+aphe­lion; guess=12; aphe­lion=aphe­lion>>guess; temp=aphe­lion/​guess; aphe­lion=aphe­lion-temp; dif=sqrt(aphe­lion); //​aphe­lion=guess|aphe­lion; aphe­lion=aphe­lion*dif; //​aphe­lion=guess^aphe­lion; guess=aphe­lion/​guess; \$EES

As you can see, there is no \$RESCOM meta­com­mand and the two “overfit” OR and XOR lines has also been com­mented or neu­tral­ized. Ran­dom aphe­lions and per­ihe­lions are be­tween 1 mil­lion and 100 mil­lion km now. If aphe­lion is greater than per­ihe­lion they are swapped first. The in­ter­me­di­ate vari­able “temp” is then re­set to 0 be­fore the “Ke­pler seg­ment” be­gins. If it wasn’t, the simu­la­tor would sim­ply use it!

Si­mu­la­tor comes out with this in the \$BES-\$EES sec­tion:

``````
aphe­lion=per­ihe­lion+aphe­lion;
aphe­lion>>=10;
guess=12;
temp=aphe­lion/​guess;
aphe­lion=aphe­lion-temp;
per­ihe­lion=sqrt(aphe­lion);
per­ihe­lion=per­ihe­lion*aphe­lion;
guess=per­ihe­lion/​guess;
``````

Less lines, but now two con­stants (10 and 12) (ap­prox­i­mately) scale days and mega-me­ters here, where the Sun is so mas­sive. Be­fore, it was only the con­stant of 12, which shifted, di­vided and was also a XOR ar­gu­ment to fit some more.

Both codes here, in­side the \$BES-\$EES seg­ment are ex­actly equiv­a­lent re­gard­ing the out­put and are a dis­tant an­ces­tor-de­scen­dant pair. Many mil­lion gen­er­a­tions apart.

• The ex­tra two places of bit shift­ing can­cel with two pre­vi­ous self-ad­di­tions.

• I know and I agree with this.

Then you have two differ­ent con­stants (10 and 12). One for the shift­ing and an­other for the di­vi­sion. It’s noth­ing wrong with that, but the simu­la­tor was pre­vented to have more con­stants then ab­solutely nec­es­sary. So ev­ery­thing was done with the “12” and I was dis­cussing that.

• The XOR with 12 won’t do much af­ter di­vid­ing by 12. For small radii, OR with 12 (in units of about 10^6km) will have an effect. Th­ese two con­stants are prob­a­bly just overfit­ting. In­deed, it nails Mer­cury, the small­est and thus most vuln­er­a­ble to these effects.* Round­ing** the square root is also prob­a­bly overfit­ting or just noise. It will have a larger effect, but smooth across planets, so it is prob­a­bly can­celed out by the choice of other num­bers. Ig­nor­ing those three effects, it is a con­stant times the 32 power of av­er­age of the axes. The de­vi­a­tion from Ke­pler’s law is that it should ig­nore per­ihe­lion.*** But for un-ec­cen­tric or­bits, there’s no differ­ence. Since the train­ing data isn’t ec­cen­tric, this failure is un­sur­pris­ing. That is, the code is un­sur­pris­ing; that the code is so ac­cu­rate is sur­pris­ing. That it cor­rectly calcu­lates the or­bital pe­riod of Halley’s comet, rather than un­der­es­ti­mat­ing by a fac­tor of 2^(3/​2) is im­plau­si­ble.***

* The con­trol group is too ho­mo­ge­neous. If it con­tained some­thing close in, overfit­ting for Mer­cury would have been pe­nal­ized in the fi­nal eval­u­a­tion.

** Are you sure it’s round­ing? [Edit: Yes: bit­wise op­er­a­tions are strongly sug­ges­tive.]

*** Th­ese state­ments are wrong be­cause I con­fused ape­he­lion with the semi-ma­jor axis. So re­mov­ing the bit­wise op­er­a­tions yields ex­actly Ke­pler’s law. If you switch from ints to dou­bles it be­comes more ac­cu­rate. But wmor­gan has a con­stant er­ror: it is di­vide by 4096, not 1024. This should make the round­ing er­rors pretty bad for Mer­cury. Maybe the bit­wise op­er­a­tions are to fix this, if they aren’t noise. My C com­piler does not re­pro­duce the claimed er­ror per­centages, so I’m not go­ing to pur­sue this.

• This is a fairly typ­i­cal ex­am­ple of ge­netic pro­gram­ming, albeit writ­ten in a slightly “out­sider” way.

No crit­i­cism, but read­ers should be given the stan­dard name of the tech­nique.

EDIT in par­tic­u­lar it’s GP sym­bolic re­gres­sion. On the other hand it’s atyp­i­cal in that there’s no crossover.

• Very in­ter­est­ing demon­stra­tion. Thanks for shar­ing this; it was fun to read through! I think I have a pretty good idea of how it works.

As a pro­fes­sional pro­gram­mer:

That code it gen­er­ated...is re­ally, re­ally shitty. It’s un­read­able, and for that rea­son, a hu­man can­not look at the gen­er­ated code and figure out “what’s go­ing on,” i.e. Ke­pler’s laws. In­so­far as it works, it’s much more rem­i­nis­cent of 0x5f3759df, but that al­gorithm was op­ti­miz­ing for speed, not cor­rect­ness or el­e­gance.

I’m not sur­prised that the al­gorithm does worse on the con­trol group, for the same rea­son that I’d ques­tion the as­sump­tion that it will do bet­ter on fu­ture gen­er­a­tions. It could eas­ily be over-fit­ting, par­tially be­cause there is no se­lec­tion pres­sure for an el­e­gant solu­tion. Em­piri­cally, el­e­gant code does bet­ter in novel con­texts.

• “Evolu­tion is dumb, but it works”—I be­lieve that show­ing that was the (im­plicit) goal, and it was am­ply demon­strated.

• Also re­lated: mir­ror.fem-net.de/​CCC/​28C3/​mp4-h264-HQ/​28c3-4764-en-au­to­matic_al­gorithm_in­ven­tion_with_a_gpu_h264.mp4.tor­rent—an­other out­sider per­spec­tive ex­pla­na­tion of ge­netic pro­gram­ming.

• You need to es­cape the un­der­scores.

• Also re­lated—an­other out­sider per­spec­tive ex­pla­na­tion of ge­netic pro­gram­ming.

• Usu­ally it then just no­tices a syn­tax er­ror and re­pairs ev­ery­thing to the last work­ing version

That sounds like a ter­rible waste. Why not use a tree-based model of C that ex­cludes as many in­valid pro­grams as prac­ti­cal?

• That sounds like a ter­rible waste.

Tech­ni­cally speak­ing this is a very swift part, since ev­ery syn­tax er­ror is de­tected be­fore the code is rewrit­ten and the mu­ta­tions are just aban­doned.

• Most ran­dom mu­ta­tions are fatal, so it fits perfectly.

• That de­pends on whether your pur­pose is to pro­duce an ana­log with evolu­tion, which we already have plenty of, or a new method for mak­ing work­ing code.

• That is the more com­mon ap­proach. Gram­mat­i­cal evolu­tion is one method.

• Eureqa is a much bet­ter tool for this. It come up with the solu­tion `Or­bit = 0.01239*Aphe­lion` in a few sec­onds. Some other solu­tions it found: http://​​i.imgur.com/​​tZwBedp.png?1

• The ge­netic pro­gram­ming may work bet­ter for cre­at­ing a strong AI than to solve that kind of prob­lem. The in­tel­li­gence did evolve, heh, heh.

If you are try­ing to evolve a pro­gram that solves some­thing of this kind, you end up with an en­tirely wrong al­gorithm, con­stants in which are go­ing to evolve to make the best out­come there can be. The solu­tion just gets stuck in lo­cal max­i­mum. Like hu­man eye that has retina back­wards. It is more pro­duc­tive to try to iter­ate over pos­si­ble solu­tions start­ing from the sim­plest (to im­ple­ment oc­cam’s ra­zor), us­ing some sep­a­rate pro­cess (hill climb­ing from sev­eral ran­dom points?) to tune the con­stants to their best.

• Given the over­all lack of ec­cen­tric­ity among the planets, I sus­pect this would con­verge faster if you started over, switch­ing the train­ing and test groups.

• This idea is based on a whole range of con­fu­sions and mi­s­un­der­stand­ings. Pro­gram code does not have the re­dun­dancy or flex­i­bil­ity of ge­net­ics, as you know with syn­tax er­rors it shat­ters if a sin­gle char­ac­ter is wrong—for this rea­son it’s a mis­take to use it as the car­rier for the ge­netic paradaigm. Another mis­take is ex­trap­o­lat­ing from a finite data source: you can’t ex­pect to get a cor­rect pro­gram this way, the code can­not con­tain any al­gorith­mic in­sight into the data for any rea­son other than pure chance. As you’ll know the soft­ware crisis is still on­go­ing,and Crit­ti­call is just an­other ex­am­ple of peo­ple not un­der­stand­ing the true na­ture of pro­grams and try­ing to get some­thing for noth­ing.

• Pro­gram code does not have the re­dun­dancy or flex­i­bil­ity of ge­net­ics, as you know with syn­tax er­rors it shat­ters if a sin­gle char­ac­ter is wrong

And a sin­gle mu­ta­tion in DNA in the wrong place can be dis­as­trous, too.

• This is as good an op­por­tu­nity as any to link this pre­sen­ta­tion and these cool pa­pers about evolv­ing patches for real-world C pro­grams in min­utes. In light of an ac­tual work­ing ex­am­ple of a com­puter pro­gram that fixes other pro­grams’ bugs, broad brush claims about how pro­grams can’t have al­gorith­mic in­sight, won’t lead to cor­rect pro­grams, and lack “the re­dun­dancy or flex­i­bil­ity of ge­net­ics” won’t re­ally wash.

• This re­search (it is a sin­gle piece of re­search writ­ten up in 4 differ­ent ways) sim­ply con­cerns tak­ing a rough piece of wood (a pro­gram that is al­most cor­rect) and sand­ing down the edges (fix­ing a small num­ber of test cases). I didn’t say “pro­grams can’t have al­gorith­mic in­sight”, I said ran­domly gen­er­at­ing prob­lems by “evolu­tion­ary” means will not con­tain in­sight by any other means than co­in­ci­dence. The re­search you linked doesn’t con­tra­dict that be­cause all it con­cerns is smooth­ing down rough edges. De­gen­er­acy is one of the fun­da­men­tal fea­tures of a ge­netic code that is re­quired for the the­ory of evolu­tion to ap­ply so I don’t know why you say that “doesn’t wash”, it’s a fact.

• De­gen­er­acy is an im­por­tant fea­ture of DNA-based evolu­tion (biolog­i­cal evolu­tion as-it-is) but it’s not fun­da­men­tal to evolu­tion as-it-could-be.

• Pro­gram code does not have the re­dun­dancy or flex­i­bil­ity of genetics

A well-writ­ten pro­gram does not have it, but many real-life pro­grams do. In my pro­gram­ming en­vi­ron­ments I have usu­ally warn­ings turned very sen­si­tive, for ex­am­ple they com­plain about de­clared but never used vari­ables/​func­tions, and when I read other peo­ple’s code, I see this of­ten.

with syn­tax er­rors it shat­ters if a sin­gle char­ac­ter is wrong

Could we see a pro­gram with syn­tax er­ror as a fatal mu­ta­tion? For­tu­nately, these fatal mu­ta­tions are easy to de­tect. (Alter­na­tively, we could mu­tate the parse trees.)

Another mis­take is ex­trap­o­lat­ing from a finite data source: you can’t ex­pect to get a cor­rect pro­gram this way, the code can­not con­tain any al­gorith­mic in­sight into the data for any rea­son other than pure chance.

Still I have seen hu­mans do this and take their pay­checks. Yes, I would ex­pect such evolved pro­gram be full of bugs; but real-life ap­pli­ca­tions also con­tain bugs and it does not pre­vent them from be­ing use­ful. I ad­mit I would hate to read the ma­chine-gen­er­ated code, or even worse fix the bugs or add new func­tion­al­ity. But this is like say­ing that read­ing DNA is, ahem, not very user-friendly. Sure, it is not; but it works.

Two more things to con­sider. First, unit tests—the bet­ter unit tests you make, the higher chance is that a ran­dom pro­gram that passes them is cor­rect. If test-driven de­vel­op­ment can re­duce ran­dom hu­man er­rors, it can also re­duce com­puter-gen­er­ated er­ror, though of course the com­puter will make much more er­rors. Se­cond, we are OK with ran­dom­ized al­gorithms hav­ing some chance of er­ror, as long at the chance is very very small.

So the prior prob­a­bil­ity of com­puter gen­er­at­ing a cor­rect al­gorithm is small, but non-zero. By mak­ing unit tests we in­crease the prob­a­bil­ity that the re­sult will be cor­rect. If we can in­crease this prob­a­bil­ity from ep­silon to 1 - ep­silon, we win.

• Ran­dom­ized al­gorithms are com­pletely differ­ent to ran­dom pro­gram code: Ran­dom­ized al­gorithms ac­tu­ally fit a pre­cise speci­fi­ca­tion while ran­dom­ized al­gorithms just (try to) fit a range of test cases (hope­fully I don’t need to ex­plain why a range of test cases is not an ad­e­quete speci­fi­ca­tion). It’s a mis­con­cep­tion that unit tests help in­crease the prob­a­bil­ity of mis­takes, A unit test can­not ever tell you that your pro­gram is free of er­rors. They’re the com­plete wrong paradaigm for mak­ing cor­rect pro­grams.

You are try­ing to sub­mit too fast. try again in 9 min­utes. ev­ery time I post a com­ment.

• Ran­dom­ized al­gorithms are com­pletely differ­ent to ran­dom pro­gram code

The dis­tinc­tion be­tween data and code is mostly use­ful, but in some situ­a­tions we in­ten­tion­ally cross the bound­ary. If you have a text ed­i­tor, then the sep­a­ra­tion is valid—the pro­gram is code, and the text doc­u­ments are data. But if you have a com­piler, you have a com­piler pro­gram, which is code, and the com­piled pro­gram, which is data, but on a differ­ent level it is also code. If you have a dy­nam­i­cally loaded library, while you load it, it is data, but then you start treat­ing it as code by run­ning it. If you have an in­ter­preter, which is a code, you em­u­late the pro­gram, which on one level is data and si­mul­ta­neously on an­other level is code. If you have a de­com­piler or de­bug­ger, which is code, it takes an­other code and treats it like data. A boot­loader loads data and then starts them as a code. Etc.

This means that treat­ing data as code, or code as data, is not nec­es­sar­ily a mis­take or con­fu­sion of lev­els, and some­times it is un­avoid­able. How else could you for ex­am­ple start a pro­gram, which origi­nally ex­ists only as a se­quence of bytes in a file on a disk? You treat it as data when load­ing the file to a mem­ory, and then you treat it as a code.

A unit test can­not ever tell you that your pro­gram is free of er­rors.

Some maths here: let X be a prob­lem we want to solve, and let’s sup­pose that the prob­lem is al­gorith­mi­cally solv­able. Fur­ther, let P be a set of all pro­grams, PX a set of pro­grams that cor­rectly solve prob­lem X, U is a unit test and PU is a set of pro­grams that passes the unit test.

We will call U a cor­rect unit test for prob­lem X if each pro­gram from PX be­longs to PU (a cor­rect pro­gram always passes the unit test) and some pro­grams in P-PX don’t be­long to PU (some in­cor­rect pro­grams don’t pass the unit test). In other words, PX is a sub­set of PU, which is a strict sub­set of P.

A ran­dom code gen­er­a­tor will provide a ran­dom dis­tri­bu­tion R of pro­grams. Now the ques­tion is whether R ∩ PX is non-empty, in other words whether the gen­er­a­tor is ca­pa­ble of gen­er­at­ing a cor­rect solu­tion as­sum­ing it is in­cred­ibly lucky. This de­pends on the gen­er­a­tor and the prob­lem. For ex­am­ple if gen­er­a­tor only cre­ates ran­dom pro­grams up to 100 lines of code, but the short­est pos­si­ble code that solves the prob­lem has 150 lines, then the gen­er­a­tor can­not cre­ate a cor­rect pro­gram. But if the gen­er­a­tor is ca­pa­ble to cre­ate a pro­gram of any length, or more pre­cisely ca­pa­ble of cre­at­ing any pro­gram, then if the pro­gram is al­gorith­mi­cally solv­able, the gen­er­a­tor will gen­er­ate the cor­rect pro­gram with a non-zero prob­a­bil­ity. Just imag­ine that you wrote the cor­rect pro­gram—well, with some non-zero prob­a­bil­ity, the out­put of the ran­dom gen­er­a­tor will be ex­actly the same text. The only prob­lem is that p(x in PX | x in R) is very small. Some­times so small that even if we could re­li­ably au­to­mat­i­cally test whether a pro­gram be­longs to PX, we just wouldn’t have enough time to wait un­til first such pro­gram is gen­er­ated. But let’s as­sume that it is a sim­ple enough pro­gram, so if we gen­er­ate a few mil­lions of pro­grams, we can ex­pect that some of them be­long to PX.

Yeah, we have a cou­ple as­sump­tions here. Again, they are: (1) a cor­rect pro­gram ex­ists, and (2) some cor­rect pro­gram has Kol­mogorov com­plex­ity small enough (i.e. has a short code, if writ­ten op­ti­mally) that we can re­al­is­ti­cally wait un­til a ran­dom gen­er­a­tor gen­er­ates it. Ac­tu­ally, also (3) some cor­rect pro­gram with small Kol­mogorov com­plex­ity runs fast enough that it will pass the unit test in given time limit.

As­sum­ing this all, we have a set of gen­er­ated ran­dom pro­grams R, some of them be­long to PX, and we are try­ing to find them. If we re­place the set R with R ∩ PU, we have a smaller set, and R ∩ PX is a sub­set of R ∩ PU, so we did not lose any solu­tion. In prob­a­bil­is­tic terms, p(x in R ∩ PX | x in R ∩ PU) is greater than p(x in PX | x in R), in hu­man speech: if we choose a ran­dom pro­gram that passes the unit test, we have bet­ter chance than if we choose any ran­dom pro­gram.

Ques­tion is, how much does the test U re­duce the set of all pro­grams. If only 1% of pro­grams passes the unit test, then p(x in R ∩ PX | x in R ∩ PU) = 100×p(x in R ∩ PX | x in R), so by do­ing the unit test we in­creased our luck 100 times… though the re­sult may be still pretty close to zero. We can filter the re­sults by an­other test, but now we need that only a few of pro­grams that have passed the pre­vi­ous test can pass the next test. Which may be difficult; and also pretty difficult to es­ti­mate in ad­vance. We can only mea­sure it af­ter­wards, how many pro­grams that passed the first test were filtered by the sec­ond one. The idea is, if a gen­er­ated pro­gram has chance 1:10^10 to be cor­rect, if we can make a se­quence of 5 unit tests where each one filters out 99% of pro­grams that have passed the pre­vi­ous tests, we have a de­cent chance at the end.

Of course to make all this work, we need a de­cent gen­er­a­tor, which will give pro­grams rea­son­able prior prob­a­bil­ities, there­fore it would be bet­ter to gen­er­ate parse trees than source texts, etc. We could even in­vent a new lan­guage that could al­low us to write shorter pro­grams. For ex­am­ple it should have an in­struc­tion “forAllItem­sInSet(SET) { X | CODE }” made of 4 sym­bols, in­stead of “for (X = 1; X < SET.size; X++) { CODE }” with 11 sym­bols, etc.