# Stuart_Armstrong

Karma: 21,395
Page 1
• If we set aside in­finity, which I don’t know how to deal with, then the SIA an­swer does not de­pend on util­ity bounds—un­like my an­thropic de­ci­sion the­ory post.

Q1: “How many copies of peo­ple (cur­rently) like me are there in each uni­verse?” is well-defined in all finite set­tings, even huge ones.

In­ci­den­tally, when you say there are “not many” copies of me in uni­verses 3 and 4, then you pre­sum­ably mean “not a high pro­por­tion, com­pared to the vast to­tal of ob­servers”

No, I mean not many, as com­pared with how many there are in uni­verses 1 and 2. Other ob­servers are not rele­vant to Q1.

I’ll re­it­er­ate my claim that differ­ent an­thropic prob­a­bil­ity the­o­ries are “cor­rect an­swers to differ­ent ques­tions”: https://​​www.less­wrong.com/​​posts/​​nxRjC93Am­sFk­fDYQj/​​an­thropic-prob­a­bil­ities-an­swer­ing-differ­ent-questions

• Yep, sorry, I saw −3, −2, −1, etc… and con­cluded you weren’t do­ing the 2 jumps; my bad!

Then some­how the work is just post­poned to the point where we try to com­bine par­tial prefer­ences?

Yes. But un­less we have other par­tial prefer­ences or meta-prefer­ences, then the only res­on­able way of com­bin­ing them is just to add them, af­ter weight­ing.

I like your re­cip­ro­cal weight­ing for­mula. It seems to have good prop­er­ties.

• “How many copies of peo­ple like me are there in each uni­verse?”

Then as long as your copies know that 3K has been ob­served, and ex­clud­ing simu­la­tions and such, the an­swers are “(a lot, a lot, not many, not many)” in the four uni­verses (I’m in­ter­pret­ing “die off be­fore spread­ing through space” as “die off just be­fore spread­ing through space”).

This is the SIA an­swer, since I asked the SIA ques­tion.

• Each par­tial prefer­ence is meant to rep­re­sent a sin­gle men­tal model in­side the hu­man, with all prefer­ences weighted the same (so there can’t be “ex­tremely weak” prefer­ences, com­pared with other prefer­ence in the same par­tial prefer­ence). Things like “in­creased in­come is bet­ter”, “more peo­ple smil­ing is bet­ter”, “be­ing em­bar­rassed on stage is the worse”.

We can imag­ine a par­tial prefer­ence with more in­ter­nal struc­ture, maybe in­ter­nal weights, but I’d sim­ply see that as two sep­a­rate par­tial prefer­ences. So we’d have the util­ities you gave to through to for one par­tial prefer­ence (ac­tu­ally, my for­mula dou­bles the num­bers you gave), and , , for the other par­tial prefer­ence—which has a very low weight by as­sump­tion. So the or­der of and is not af­fected.

EDIT: I’m pretty sure we can gen­er­al­ise my method for differ­ent weights of prefer­ences, by chang­ing the for­mula that sums the squares of util­ity differ­ence.

• Neat!

Though I should men­tion that my cur­rent ver­sion of par­tial prefer­ences does not as­sume all cy­cles are closed—the con­strained op­ti­mi­sa­tion can be seen as try­ing to get “as close as pos­si­ble” to that, given non-closed cy­cles.

• What’s the set of an­swers, and how are they as­sessed?

1. I en­courage you to sub­mit other ideas any­way, since your ideas are good.

2. Not sure yet about how all these things re­late; will maybe think of that more later.

• Hey there! Thanks for your long com­ment—but, alas, this model of par­tial prefer­ences is ob­so­lete :-(

Be­cause of other prob­lems with this, I’ve re­placed it with the much more gen­eral con­cept of a pre­order. This can ex­press all the things we want to ex­press, but is a lot less in­tu­itive for how hu­mans model things. I may come up with some al­ter­na­tive defi­ni­tion at some point (less gen­eral than a pre­order, but more gen­eral than this post.

Thanks for the com­ment in any case.

• I find it hard to imag­ine that you’re ac­tu­ally deny­ing that you or I have things that, col­lo­quially, one would de­scribe as prefer­ences, and ex­ist in an ob­jec­tive sense.

I deny that a generic out­side ob­server would de­scribe us as hav­ing any spe­cific set of prefer­ences, in an ob­jec­tive sense.

This doesn’t bother me too much, be­cause it’s suffi­cient that we have prefer­ences in a sub­jec­tive sense—that we can use our own em­pa­thy mod­ules and self-re­flec­tion to define, to some ex­tent, our prefer­ences.

a brain is ul­ti­mately many fewer as­sump­tions (to the pre-in­dus­trial Norse peo­ple)

“Real­is­tic” prefer­ences make ul­ti­mately fewer as­sump­tions (to ac­tual hu­mans) that “fully ra­tio­nal” or other prefer­ence sets.

The prob­lem is that this is not true for generic agents, or AIs. We have to get the hu­man em­pa­thy mod­ule into the AI first—not so it can pre­dict us (it can already do that through other means), but so that its de­com­po­si­tion of our prefer­ences is the same as ours.

• Can you make this a bit more gen­eral, rather than just for the spe­cific ex­am­ple?

• For low band­width, you have to spec­ify the set of an­swers that are available (and how they would be checked).

• Say­ing that an agent has a prefer­ence/​re­ward R is an in­ter­pre­ta­tion of that agent (similar to the “in­ten­tional stance” of see­ing it as an agent, rather than a col­lec­tion of atoms). And the (p,R) and (-p,-R) in­ter­pre­ta­tions are (al­most) equally com­plex.

• I dis­agree. I think that if we put a com­plex­ity up­per bound on hu­man ra­tio­nal­ity, and as­sume noisy ra­tio­nal­ity, then we will get val­ues that are “mean­ingless” from your per­spec­tive.

I’m try­ing to think of ways how of we could test this....

• If you as­sume that hu­man val­ues are sim­ple (low komel­gorov com­plex­ity) and that hu­man be­hav­ior is quite good at fulfilling those val­ues, then you can de­duce non triv­ial val­ues for hu­mans.

And you will de­duce them wrong. “Hu­man val­ues are sim­ple” pushes you to­wards “hu­mans have no prefer­ences”, and if by “hu­man be­hav­ior is quite good at fulfilling those val­ues” you mean some­thing like noisy ra­tio­nal­ity, then it will go very wrong, see for ex­am­ple https://​​www.less­wrong.com/​​posts/​​DuPjCTeW9oRZzi27M/​​bounded-ra­tio­nal­ity-abounds-in-mod­els-not-ex­plic­itly-defined

And if in­stead you mean a proper ac­count­ing of bounded ra­tio­nal­ity, of the differ­ence be­tween an­chor­ing bias and taste, of the differ­ence be­tween sys­tem 1 and sys­tem 2, of the whole col­lec­tion of hu­man bi­ases… well, then, yes, I might agree with you. But that’s be­cause you’ve already put all the hard work in.

• I’ve think­ing of a rather naive form of prefer­ence util­i­tar­i­anism, of the sort “if the hu­man agree to it or choose it, then it’s ok”. In par­tic­u­lar, you can end up with some forms of de­pres­sion where the hu­man is mis­er­able, but isn’t will­ing to change.

I’ll clar­ify that in the post.

• I’m find­ing these “is the cor­rect util­ity func­tion” hard to parse. Hu­mans have a bit of and a bit of . But we are un­der­defined sys­tems; there is no spe­cific value of that is “true”. We can only as­sess the qual­ity of us­ing other as­pects of hu­man un­der­defined prefer­ences.

This seems way too hand­wavy.

It is. Here’s an at­tempt at a more for­mal defi­ni­tion: hu­mans have col­lec­tions of un­der­defined and some­what con­tra­dic­tory prefer­ences (us­ing prefer­ences in a more gen­eral sense than prefer­ence util­i­tar­i­anism). Th­ese prefer­ences seem to be stronger in the nega­tive sense than in the pos­i­tive: hu­mans seem to find the loss of a prefer­ence much worse than the gain. And the nega­tive is much more salient, and of­ten much more clearly defined, than that pos­i­tive.

Given that max­imis­ing one prefer­ence tends to put the val­ues of oth­ers at ex­treme val­ues, hu­man over­all prefer­ences seem bet­ter cap­tured by a weighted mix of prefer­ences (or a smooth min of prefer­ences) than by any sin­gle prefer­ence, or small set of prefer­ences. So it is not a good idea to be too close to the ex­tremes (ex­tremes be­ing places where some prefer­ences have weight put on them).

Now there may be some sense in which these ex­treme prefer­ences are “cor­rect”, ac­cord­ing to some for­mal sys­tem. But this for­mal sys­tem must re­ject the ac­tual prefer­ences of hu­mans to­day; so I don’t see why these prefer­ences should be fol­lowed at all, even if they are cor­rect.

Ok, so the ex­tremes are out; how about be­ing very close to the ex­tremes? Here is where it gets wishy­washy. We don’t have a full the­ory of hu­man prefer­ences. But, ac­cord­ing to the pic­ture I’ve sketched above, the im­por­tant thing is that each prefer­ence gets some pos­i­tive trac­tion in our fu­ture. So, yes to might no mean much (and smooth min might be bet­ter any­way). But I be­lieve I could say:

• There are many weighted com­bi­na­tions of hu­man prefer­ences that are com­pat­i­ble with the pic­ture I’ve sketched here. Very differ­ent out­comes, from the nu­mer­i­cal per­spec­tive of the differ­ent prefer­ences, but all fal­ling within an “ac­cept­abil­ity” range.

Still a bit too hand­wavy. I’ll try and im­prove it again.

• Thanks, cor­rected a few ty­pos.

Why must a pre­order de­com­pose into dis­joint or­dered chains?

They don’t have to; I’m say­ing that sen­si­ble par­tial prefer­ences (eg ) should do so. I then see how I’d deal with sen­si­ble pre­orders, then gen­er­al­ise to all pre­orders in the next sec­tion.

How do cy­cles van­ish in ? Can you work through the ex­am­ple where the par­tial prefer­ence ex­pressed by the hu­man is ?

Note that what you’ve writ­ten is im­pos­si­ble as means but not . A pre­order is tran­si­tive, so the best you can get is .

Then pro­ject­ing down (via ) to will pro­ject all these down to the same el­e­ment. That’s why there are no cy­cles, be­cause all cy­cles go to points.

Then we need to check some math. Define on by iff .

This is well defined (in­de­pen­dently of which and we use to rep­re­sent and ), be­cause if , then , so, by tran­si­tivity, . The same ar­gu­ment works for .

We now want to show the is a par­tial or­der on . It’s tran­si­tive, be­cause if and , then , and the tran­si­tivity in im­plies and hence .

That shows it’s a pre­order. To show par­tial or­der, we need to show there are no cy­cles. So, if and , then and , hence, by defi­ni­tion of , . So it’s a par­tial or­der.