# Tetraspace Grouping’s Shortform

• In the Parable of Pre­dict-O-Matic, a sub­net­work of the titu­lar Pre­dict-O-Matic be­comes a mesa-op­ti­miser and be­gins steer­ing the fu­ture to­wards its own goals, in­de­pen­dently of the rest of Pre­dict-O-Matic. It does so in a way that sab­o­tages the other sub­net­works.

I am re­minded of one speci­fi­ca­tion prob­lem that a run of Eurisko faced:

Dur­ing one run, Le­nat no­ticed that the num­ber in the Worth slot of one newly dis­cov­ered heuris­tic kept ris­ing, in­di­cat­ing that Eurisko had made a par­tic­u­larly valuable find. As it turned out the heuris­tic performed no use­ful func­tion. It sim­ply ex­am­ined the pool of new con­cepts, lo­cated those with the high­est Worth val­ues, and in­serted its name in their My Creator slots.

One thing I won­dered is whether this could hap­pen in hu­mans, and if not, why it doesn’t. A sim­plified de­scrip­tion of mem­ory that I learned in a flash game is that “neu­ral con­nec­tions” are “strength­ened” when­ever they are “used”, which sounds sort of like gra­di­ents in RL if you don’t think about it too hard. Maybe the analogue of this would be some mem­ory that “wants” you to re­mem­ber it re­peat­edly at the ex­pense of other mem­o­ries. Trauma?

Prelude

The pro­logue be­gins with a short story called the Tale of the Omega Team. It’s a wish-fulfil­ment pseudo-isekai about a bunch of effec­tive al­tru­ist tech peo­ple work­ing for not-Google called the Omegas who make an AGI and then use it to take over the world.

But a cy­ber­se­cu­rity spe­cial­ist on their team talked them out of the game plan [...] risk of Prometheus break­ing out and seiz­ing con­trol of its own des­tiny [...] weren’t sure how its goals would evolve [...] go to great lengths to keep Prometheus confined

For some rea­son, the Omegas in the story claim that the Prometheus (the AI) might be un­safe, and then pro­ceed to do things like have it write soft­ware which they then run on com­put­ers and let it pro­duce long pieces of an­i­mated me­dia and let it send blueprints of tech­nolo­gies to sci­en­tists. There is a cy­ber­se­cu­rity ex­pert in the team who just barely stops them from straight up leav­ing the whole thing un­boxed, and I do not envy her job po­si­tion.

(Prometheus is safe, it turns out, which I can tell be­cause there are hu­mans al­ive at the end of the story.)

[...] Omega-con­trol­led [...] con­trol­led by the Omegas [...] the Omegas har­nessed Prometheus [...] the Omegas’ [...] the Omegas’ [...]

There’s also an­other odd thing where it says that the Omegas are us­ing Prometheus as a tool to do things, in­stead of what’s clearly ac­tu­ally hap­pen­ing which is that Prometheus is achiev­ing its goals with the Omegas be­ing some lumps of atoms that it’s been push­ing around ac­cord­ing to its whims, as it has been since they de­cided to switch it on.

All-in-all, I like it. It wouldn’t be out of place on r/​ra­tio­nal, if wish-fulfill­ment pseudo-isekai does hap­pen then AGI sweep­ing aside the pre­vi­ous so­cial or­der will be how (a real AGI would come close to some of the ca­pa­bil­ities I’ve seen those pro­tag­o­nists have), and fic­tion about more plau­si­ble robopoca­lypses (or roboutopias) com­ing about is always great.

• In Against Against Billion­aire Philan­thropy, Scott says

The same is true of Google search. I ex­am­ined the top ten search re­sults for each dona­tion, with broadly similar re­sults: mostly nega­tive for Zucker­berg and Be­zos, mostly pos­i­tive for Gates.

Also, as far as I can tell, Moskovitz’ philan­thropy is gen­er­ally con­sid­ered pos­i­tively, though of course I would be in a bub­ble with re­spect to this. Also also, though I say this with­out re­ally check­ing, it seems that peo­ple are pretty much all against the Sack­lers’ dona­tions to art galleries and mu­se­ums.

Squint­ing at these data points, I can kind of see a trend: peo­ple favour philan­thropy that’s buy­ing utilons, and are op­posed to philan­thropy that’s buy­ing sta­tus. They like billion­aires fund­ing global de­vel­op­ment more than they like billion­aires fund­ing lo­cal causes, and they like them fund­ing art galleries for the rich least of all.

Which is ba­si­cally what you’d ex­pect if peo­ple were well-cal­ibrated and cor­rectly crit­i­cis­ing those who need to be taken down a peg.

• and they like them fund­ing art galleries for the rich least of all.

What are these art galleries “for the rich”? Your link men­tions the Na­tional Gallery, the Tate Gallery, the Smith­so­nian, the Lou­vre, the Guggen­heim, the Sack­ler Mu­seum at Har­vard, the Metropoli­tan Mu­seum of Art, and the Amer­i­can Mu­seum of Nat­u­ral His­tory as re­cip­i­ents of Sack­ler money. All of them are open to ev­ery­one. The first three are free and the oth­ers charge in the re­gion of $15-$25 (as do the Na­tional Gallery and the Tate Gallery for spe­cial ex­hi­bi­tions, but not the bulk of their dis­plays). The hos­tility to Sack­ler money has noth­ing to do with “how dare they be billion­aires”, but is be­cause of the (allegedly) un­eth­i­cal prac­tices of the phar­ma­ceu­ti­cal com­pany that the Sack­lers own and owe their for­tune to. No-one had any prob­lem with their dona­tions be­fore.

Which is ba­si­cally what you’d ex­pect if peo­ple were well-cal­ibrated and cor­rectly crit­i­cis­ing those who need to be taken down a peg.

I see noth­ing cor­rect in the ethics of the crab bucket.

• The sim­plic­ity prior is that you should as­sign a prior prob­a­bil­ity 2^-L to the de­scrip­tion of length L. This sort of makes in­tu­itive sense, since it’s what you’d get if you gen­er­ated the de­scrip­tion through a se­ries of coin­flips...

… ex­cept there are 2^L de­scrip­tions of length L, so the to­tal prior prob­a­bil­ity you’re as­sign­ing is sum(2^L * 2^-L) = sum(1) = un­nor­mal­is­able.

You can kind of re­cover this by notic­ing that not all bit­strings cor­re­spond to an ac­tual de­scrip­tion, and for some en­cod­ings their den­sity is low enough that it can be nor­mal­ised (I think the thresh­old is that less than 1/​L de­scrip­tions of length L are “valid”)...

...but if that’s the case, you’re be­ing fairly in­for­ma­tion in­effi­cient be­cause you could com­press de­scrip­tions fur­ther, and why are you judg­ing sim­plic­ity us­ing such a bad en­cod­ing, and why 2^-L in that case if it doesn’t re­ally cor­re­spond to com­plex­ity prop­erly any more? And other ques­tions in this cluster.

I am con­fused (and maybe too hung up on some­thing idiosyn­cratic to an in­tu­itive de­scrip­tion I heard).

If so, in the al­gorith­mic in­for­ma­tion liter­a­ture, they usu­ally fix the un­nor­mal­iz­abil­ity stuff by talk­ing about Pre­fix Tur­ing ma­chines. Which cor­re­sponds to only al­low­ing TM de­scrip­tions that cor­re­spond to a valid Pre­fix Code.

But it is a good point that for steeper dis­count­ing rates, you don’t need to do that.

• Imag­ine two pre­dic­tion mar­kets, both with shares that give you $1 if they pay out and$0 oth­er­wise.

One is pre­dict­ing some event in the real world (and pays out if this event oc­curs within some timeframe) and has shares cur­rently priced at $X. The other is pre­dict­ing the be­havi­our of the first pre­dic­tion mar­ket. Speci­fi­cally, it pays out if the price of the first pre­dic­tion mar­ket ex­ceeds an up­per thresh­hold$T be­fore it goes be­low a lower thresh­hold \$R.

Is there any­thing that can be said in gen­eral about the price of the sec­ond pre­dic­tion mar­ket? For ex­am­ple, it feels in­tu­itively like if T >> X, but R is only a lit­tle bit smaller than X, then as­sign­ing a high price to shares of the sec­ond pre­dic­tion mar­ket vi­o­lates con­ser­va­tion of ev­i­dence—is this true, and can it be quan­tified?

• Over the past few days I’ve been read­ing about re­in­force­ment learn­ing, be­cause I un­der­stood how to make a neu­ral net­work, say, recog­nise hand­writ­ten digits, but I wasn’t sure how at all that could be turned into get­ting a com­puter to play Atari games. So: what I’ve learned so far. Spin­ning Up’s In­tro to RL prob­a­bly ex­plains this bet­ter.

(Brief sum­mary, ex­plained prop­erly be­low: The agent is a neu­ral net­work which runs in an en­vi­ron­ment and re­ceives a re­ward. Each pa­ram­e­ter in the neu­ral net­work is in­creased in pro­por­tion to how much it in­creases the prob­a­bil­ity of mak­ing the agent do what it just did, and how good the out­come of what the agent just did was.)

Re­in­force­ment learn­ers play in­side a game in­volv­ing an agent and an en­vi­ron­ment. On turn , the en­vi­ron­ment hands the agent an ob­ser­va­tion , and the agent hands the en­vi­ron­ment an ac­tion . For an agent act­ing in re­al­time, there can be sixty turns a sec­ond; this is fine.

The en­vi­ron­ment has a tran­si­tion func­tion which takes an ob­ser­va­tion-ac­tion pair and re­sponds with a prob­a­bil­ity dis­tri­bu­tion over ob­ser­va­tions on the next timestep ; the agent has a policy that takes an ob­ser­va­tion and re­sponds with a prob­a­bil­ity dis­tri­bu­tion over ac­tions to take .

The policy is usu­ally writ­ten as , and the prob­a­bil­ity that out­puts an ac­tion in re­sponse to an ob­ser­va­tion is . In prac­tise, is usu­ally a neu­ral net­work that takes ob­ser­va­tions as in­put and has ac­tions as out­put (us­ing some­thing like a soft­max layer to give a prob­a­bil­ity dis­tri­bu­tion); the pa­ram­e­ters of this neu­ral net­work are , and the cor­re­spond­ing policy is .

At the end of the game, the en­tire tra­jec­tory is as­signed a score, , mea­sur­ing how well the agent has done. The goal is to find the policy that max­imises this score.

Since we’re us­ing ma­chine learn­ing to max­imise, we should be think­ing of gra­di­ent de­scent, which in­volves find­ing the lo­cal di­rec­tion in which to change the pa­ram­e­ters in or­der to in­crease the ex­pected value of by the great­est amount, and then in­creas­ing them slightly in that di­rec­tion.

In other words, we want to find .

Writ­ing the ex­pec­ta­tion value in terms of a sum over tra­jec­to­ries, this is = , where is the prob­a­bil­ity of ob­serv­ing the tra­jec­tory if the agent fol­lows the policy , and is the space of pos­si­ble tra­jec­to­ries.

The prob­a­bil­ity of see­ing a spe­cific tra­jec­tory hap­pen is the product of the prob­a­bil­ities of any in­di­vi­d­ual step on the tra­jec­tory hap­pen­ing, and is hence where is the prob­a­bil­ity that the en­vi­ron­ment out­puts the ob­ser­va­tion in re­sponse to the ob­ser­va­tion-ac­tion pair . Prod­ucts are awk­ward to work with, but prod­ucts can be turned into sums by tak­ing the log­a­r­ithm - .

The gra­di­ent of this is . But what the en­vi­ron­ment does is in­de­pen­dent of , so that en­tire term van­ishes, and we have . The gra­di­ent of the policy is quite easy to find, since our policy is just a neu­ral net­work so you can use back-prop­a­ga­tion.

Our ex­pres­sion for the ex­pec­ta­tion value is just in terms of the gra­di­ent of the prob­a­bil­ity, not the gra­di­ent of the log­a­r­ithm of the prob­a­bil­ity, so we’d like to ex­press one in terms of the other.

Con­ve­niently, the chain rule gives , so . Sub­sti­tut­ing this back into the origi­nal ex­pres­sion for the gra­di­ent gives

,

and sub­sti­tut­ing our ex­pres­sion for the gra­di­ent of the log­a­r­ithm of the prob­a­bil­ity gives

.

No­tice that this is the defi­ni­tion of the ex­pec­ta­tion value of , so writ­ing the sum as an ex­pec­ta­tion value again we get

.

You can then find this ex­pec­ta­tion value eas­ily by sam­pling a large num­ber of tra­jec­to­ries (by run­ning the agent in the en­vi­ron­ment many times), calcu­lat­ing the term in­side the brack­ets, and then av­er­ag­ing over all of the runs.

Neat!

(More so­phis­ti­cated RL al­gorithms ap­ply var­i­ous trans­for­ma­tions to the re­ward to use in­for­ma­tion more effi­ciently, and use var­i­ous gra­di­ent de­scent tricks to use the gra­di­ents ac­quired to con­verge on the op­ti­mal pa­ram­e­ters more effi­ciently)

• Here are three state­ments I be­lieve with a prob­a­bil­ity of about 1/​9:

• The two 6-sided dice on my desk, when rol­led, will add up to 5.

• An AI sys­tem will kill at least 10% of hu­man­ity be­fore the year 2100.

• Star­va­tion was a big con­cern in an­cient Rome’s prime (claim bor­rowed from Eliz­a­beth’s Epistemic Spot Check post).

Ex­cept I have some feel­ing that the “true prob­a­bil­ity” of the 6-sided die ques­tion is pretty much bang on ex­actly 19, but that the “true prob­a­bil­ity” of the Rome and AI xrisk ques­tions could be quite far from 19 and to say the prob­a­bil­ity is pre­cisely 19 seems… over­con­fi­dent?

From a straight­for­ward Bayesian point of view, there is no true prob­a­bil­ity. It’s just my sub­jec­tive de­gree of be­lief! I’d be will­ing to make a bet at 81 odds on any of these, but not at worse odds, and that’s all there re­ally is to say on the mat­ter. It’s the num­ber I mul­ti­ply by the util­ities of the out­comes to make de­ci­sions.

One thing you could do is imag­ine a set of hy­pothe­ses that I have that in­volve ran­dom­ness, and then I have a prob­a­bil­ity dis­tri­bu­tion over which of these hy­pothe­ses is the true one, and by map­ping each hy­poth­e­sis to the prob­a­bil­ity it as­signs to the out­come my prob­a­bil­ity dis­tri­bu­tion over hy­pothe­ses be­comes a prob­a­bil­ity dis­tri­bu­tion over prob­a­bil­ities. This is sharply around 19 for the dice rolls, and widely around 19 for AI xrisk, as ex­pected, so I can re­port 50% con­fi­dence in­ter­vals just fine. Ex­cept sen­si­ble hy­pothe­ses about his­tor­i­cal facts prob­a­bly wouldn’t be ran­dom, be­cause ei­ther star­va­tion was im­por­tant or it wasn’t, that’s just a true thing that hap­pens to ex­ist in my past, maybe.

I like ja­cob­ja­cob’s in­ter­pre­ta­tion of a prob­a­bil­ity dis­tri­bu­tion over prob­a­bil­ities as an es­ti­mate of what your sub­jec­tive de­gree of be­lief would be if you thought about the prob­lem for longer (e.g. 10 hours). The spe­cific time hori­zon seems a bit ar­tifi­cial (ex­treme case: I’m go­ing to chat with an ex­pert his­to­rian in 10 hours and 1 minute) but it does work and gives me the kind of re­sults that makes sense. The ad­van­tage of this is that you can quite straight­for­wardly test your cal­ibra­tion (there re­ally is a ground truth) - write down your 50% con­fi­dence in­ter­val, then ac­tu­ally do the 10 hours of re­search, and see how of­ten the de­gree of be­lief you end up with lies in­side the in­ter­val.