# rk

Karma: 344 (LW), 1 (AF)
NewTop
Page 1
• Yes, you’re quite right!

The in­tu­ition be­comes a lit­tle clearer when I take the fol­low­ing al­ter­na­tive deriva­tion:

Let us look at the change in ex­pected value when I in­crease my ca­pa­bil­ities. From the ex­pected value stem­ming from wor­lds where I win, we have . For the other ac­tor, their prob­a­bil­ity of win­ning de­creases at a rate that matches my in­crease in prob­a­bil­ity of win­ning. Also, their prob­a­bil­ity of de­ploy­ing a safe AI doesn’t change. So the change in ex­pected value stem­ming fro m wor­lds where they win is .

We should be in­differ­ent to in­creas­ing ca­pa­bil­ities when these sum to 0, so .

Let’s choose our units so . Then, us­ing the ex­pres­sions for from your com­ment, we have .

Di­vid­ing through by we get . Col­lect­ing like terms we have and thus . Sub­sti­tut­ing for we have and thus

• It seems like keep­ing a part ‘out­side’ the ex­pe­rience/​feel­ing is a big part for you. Does that sound right? (Similar to the un­blend­ing Kaj talks about in his IFS post or clear­ing a space in Fo­cus­ing)

Now of course to­day’s struc­ture/​pro­cess is to­mor­row’s content

Do you mean here that as you progress, you will in­tro­spect on the na­ture of your pre­vi­ous in­tro­spec­tions, rather than more ‘ob­ject-level’ thoughts and feel­ings?

• I think that though one may use the tech­niques look­ing for a solu­tion (which I agree makes them solu­tion-ori­ented in a sense), it’s not right to so that in, say, Fo­cus­ing, you in­tro­spect on solu­tions rather than causes. So maybe the differ­ence is more the op­ti­mism than the area of fo­cus?

• 20 Feb 2019 21:22 UTC
2 points

This points to a lot of what the differ­ence feels like to me! It jibes with my in­tu­ition for the situ­a­tion that prompted this ques­tion.

I was mildly anx­ious about some­thing (I for­get what), and stopped my­self as I was about to move on to some work (in which I would have lost the anx­iety). I thought it might be use­ful to be with the anx­iety a bit and see what was so anx­ious about the situ­a­tion. This felt like it would be use­ful, but then I won­dered if I would get bad ru­mi­na­tive effects. It seemed like I wouldn’t, but I wasn’t sure why.

I’m not sure if I should be given pause by the fact you say that ru­mi­na­tion is con­cerned with ac­tion; my read­ing of the wikipe­dia page is that be­ing con­cerned with ac­tion is a big miss­ing fea­ture of rumination

• I came back to this post be­cause I was think­ing about Scott’s crit­i­cism of sub­minds where he com­plains about “lit­tle peo­ple who make you drink beer be­cause they like beer”.

I’d already been con­sid­er­ing how your robot model is nice for see­ing why some­thing sub­mind-y would be go­ing on. How­ever, I was still con­fused about think­ing about these var­i­ous sys­tems as ba­si­cally peo­ple who have feel­ings and should be ne­go­ti­ated with, us­ing ba­si­cally the same tech­niques I’d use to ne­go­ti­ate with peo­ple.

Re­vis­it­ing, the “Per­son­al­ized char­ac­ters” sec­tion was pretty use­ful. It’s nice to see it more as a claim that ‘[some­times for some peo­ple] in­ter­nal pro­cesses may be rep­re­sented us­ing so­cial ma­chin­ery’ than ‘in­ter­nal agents are like fight­ing peo­ple’.

# [Question] When does in­tro­spec­tion avoid the pit­falls of ru­mi­na­tion?

20 Feb 2019 14:14 UTC
23 points
• I re­ally en­joyed this post and start­ing with the plau­si­ble robot de­sign was re­ally helpful for me ac­cess­ing the IFS model. I also en­joyed re­flect­ing on your pre­vi­ous ob­jec­tions as a struc­ture for the sec­ond part.

The part with re­peated un­blend­ing sounds rem­i­nis­cent of the “Clear­ing a space” stage of Fo­cus­ing, in which one ac­knowl­edges and sets slightly to the side the prob­lems in one’s life. Im­por­tantly, you don’t “go in­side” the prob­lems (I take ‘go­ing in­side’ to be more-or-less ex­pe­rienc­ing the af­fect as­so­ci­ated with the prob­lems). This seems pretty similar to stop­ping var­i­ous pro­tec­tors from plac­ing nega­tive af­fect into con­scious­ness.

I no­ticed some­thing at the end that it might be use­ful to re­flect on: I pat­tern matched the im­por­tance of child­hood trau­mas to woo and it definitely de­creased my sub­jec­tive cre­dence in the IFS model. I’m not sure to what ex­tent I en­dorse that re­ac­tion.

One thing I’d be in­ter­ested in ex­pan­sion on: you men­tion you think that IFS would benefit most peo­ple. What do you mean by ‘benefit’ in this case? That it would in­crease their wellbe­ing? Their per­sonal effi­cacy? Or per­haps that it will in­crease at least one of their wellbe­ing and per­sonal effi­cacy but not nec­es­sar­ily both for any given per­son?

• I think this is a great sum­mary (EDIT: this should read “I think the sum­mary in the newslet­ter was great”).

That said, these mod­els are still very sim­plis­tic, and I mainly try to de­rive qual­i­ta­tive con­clu­sions from them that my in­tu­ition agrees with in hind­sight.

Yes, I agree. The best in­di­ca­tor I had of mak­ing a math­e­mat­i­cal mis­take was whether my in­tu­ition agreed in hindsight

• Thanks! The info on par­a­site speci­fic­ity/​his­tory of malaria is re­ally use­ful.

I won­der if you know of any­thing speci­fi­cally about the rel­a­tive cost-effec­tive­ness of nets for in­fected peo­ple vs un­in­fected peo­ple? No wor­ries if not

• Prob­a­bly the most valuable nets are those de­ployed on peo­ple who already have malaria, to pre­vent it from spread­ing to mosquitoes, and thus to more people

Also, can an­i­mals har­bour malaria pathogens that harm hu­mans? This sec­tion of the wiki page on malaria makes me think not, but it’s not ex­plic­itly stated

• your de­ci­sion the­ory maps from de­ci­sions to situations

Could you say a lit­tle more about what a situ­a­tion is? One thing I thought is maybe that a situ­a­tion is a re­sult of a choice? But then it sounds like your de­ci­sion the­ory de­cides whether you should, for ex­am­ple, take an offered piece of choco­late, re­gard­less of whether you like choco­late or not. So I guess that’s not it

But the point is that each the­ory should be ca­pa­ble of stand­ing on its own

Can you say a lit­tle more about how ADT doesn’t stand on its own? After all, ADT is just defined as:

An ADT agent is an agent that would im­ple­ment a self-con­firm­ing link­ing with any agent that would do the same. It would then max­imises its ex­pected util­ity, con­di­tional on that link­ing, and us­ing the stan­dard non-an­thropic prob­a­bil­ities of the var­i­ous wor­lds.

Is the prob­lem that it men­tions ex­pected util­ity, but it should be ag­nos­tic over val­ues not ex­press­ible as util­ities?

• So I think an ac­count of an­throp­ics that says “give me your val­ues/​moral­ity and I’ll tell you what to do” is not an ac­count of moral­ity + an­throp­ics, but has ac­tu­ally pul­led out moral­ity from an ac­count of an­throp­ics that shouldn’t have had it. (Schemat­i­cally, rather than define adt(de­ci­sionProb­lem) = chooseBest(someValues, de­ci­sionProb­lem), you now have define adt(val­ues, de­ci­sionProb­lem) = chooseBest(val­ues, de­ci­sionProb­lem))

Per­haps you think that an ac­count that makes men­tion of moral­ity ends up be­ing (partly) a the­ory of moral­ity? And that also we should be able to un­der­stand an­thropic situ­a­tions apart from val­ues?

To try and give some in­tu­ition for my way of think­ing about things, sup­pose I flip a fair coin and ask agent A if it came up heads. If it guesses heads and is cor­rect, it gets $100. If it guesses tails and is cor­rect, both agents B and C get$100. Agents B and C are not de­rived from A in any spe­cial way and will not be offered similar prob­lems—there is not sup­posed to be any­thing an­thropic here.

What should agent A do? Well that de­pends on A’s val­ues! This is go­ing to be true for a non-an­thropic de­ci­sion the­ory so I don’t see why we should ex­pect an an­thropic de­ci­sion the­ory to be free of this de­pen­dency.

Here’s an­other guess at some­thing you might think: “an­throp­ics is about prob­a­bil­ities. It’s cute that you can par­cel up value-laden de­ci­sions and an­throp­ics, but it’s not about de­ci­sions.”

Maybe that’s the right take. But even if so, ADT is use­ful! It says that in sev­eral an­thropic situ­a­tions, even if you’ve not sorted your an­thropic prob­a­bil­ities out, you can still know what to do.

• It seems to me that ADT sep­a­rates an­throp­ics and moral­ity. For ex­am­ple, Bayesi­anism doesn’t tell you what you should do, just how to up­date your be­liefs. Given your be­liefs, what you value de­cides what you should do. Similarly, ADT gives you an an­thropic de­ci­sion pro­ce­dure. What ex­actly does it tell you to do? Well, that de­pends on your moral­ity!

• As I read through, the core model fit well with my in­tu­ition. But then I was sur­prised when I got to the sec­tion on re­li­gious schisms! I won­dered why we should model the ad­her­ents of a re­li­gion as try­ing to join the school with the most ‘ac­cu­rate’ claims about the re­li­gion.

On re­flec­tion, it ap­pears to me that the model prob­a­bly holds roughly as well in the re­li­gion case as the lo­cal ra­dio in­tel­lec­tual case. Both of those are ex­am­ples of “hos­tile” talk­ing up. I won­der if the ways in which those cases di­verge from pure in­for­ma­tion shar­ing ex­plains the differ­ence be­tween hum­ble and hos­tile.

In par­tic­u­lar, per­haps some au­di­ences are look­ing to re­duce cog­ni­tive dis­so­nance be­tween their self-image as un­bi­ased on the one hand and their par­tic­u­lar be­liefs and prefer­ences on the other. That leaves an open­ing for some­one to sell rea­son­able­ness/​un­bi­ased­ness self-image to peo­ple hold­ing a given set of be­liefs and prefer­ences.

Some­one mak­ing rea­son­able coun­ter­ar­gu­ments is a threat to what you’ve offered, and in that case your job is to provide re­fu­ta­tion, coun­ter­ar­gu­ment and dis­credit so it is easy for that per­son’s ar­gu­ments to be dis­missed (through a mix­ture of claimed flaws in their ar­gu­ments and claimed flaws in the per­son pro­mot­ing them). This would be a ‘hos­tile’ talk­ing up.

Also, we should prob­a­bly ex­pect to find it hard to dis­t­in­guish be­tween some hos­tile talk­ing ups and over­con­fi­dent talk­ing downs. If we could always dis­t­in­guish, hos­tile talk­ing up is a clear sig­nal of defeat.

• 23 Nov 2018 17:40 UTC
3 points

When it comes to dis­clo­sure poli­cies, if I’m un­cer­tain be­tween the “MIRI view” and the “Paul Chris­ti­ano” view, should I bite the bul­let and back one ap­proach over the other? Or can I aim to sup­port both views, with­out wor­ry­ing that they’re defeat­ing each other?

My cur­rent un­der­stand­ing is that it’s co­her­ent to sup­port both at once. That is, I can think that pos­si­bly in­tel­li­gence needs lots of fun­da­men­tal in­sights, and that safety needs lots of similar in­sights (this is sup­posed to be a char­ac­ter­i­sa­tion of a MIRI-ish view). I can think that work done on figur­ing out more about in­tel­li­gence and how to con­trol it should only be shared cau­tiously, be­cause it may ac­cel­er­ate the cre­ation of AGI.

I can also think that pro­saic AGI is pos­si­ble, and fun­da­men­tal in­sights aren’t needed. Then I might think that I could do re­search that would help al­ign pro­saic AGIs but couldn’t pos­si­bly al­ign (or con­tribute to) an agent-based AGI.

Is the above con­sis­tent? Also do peo­ple (with bet­ter em­u­la­tors of peo­ple) who worry about dis­clo­sure think that this makes sense from their point of view?

# Believ­ing oth­ers’ priors

22 Nov 2018 20:44 UTC
9 points
• I think you’ve got a lot of the core idea. But it’s not im­por­tant that we know that the data point has some rank­ing within a dis­tri­bu­tion. Let me try and ex­plain the ideas as I un­der­stand them.

The un­bi­ased es­ti­ma­tor is un­bi­ased in the sense that for any ac­tual value of the thing be­ing es­ti­mated, the ex­pected value of the es­ti­ma­tion across the pos­si­ble data is the true value.

To be con­crete, sup­pose I tell you that I will gen­er­ate a true value, and then add ei­ther +1 or −1 to it with equal prob­a­bil­ity. An un­bi­ased es­ti­ma­tor is just to re­port back the value you get:

E[es­ti­mate(x)] = es­ti­mate(x + 1)/​2 + es­ti­mate(x − 1)/​2

If the es­ti­mate func­tion is iden­tity, we have (x + x +1 −1)/​2 = x. So its un­bi­ased.

Now sup­pose I tell you that I will gen­er­ate the true value by draw­ing from a nor­mal dis­tri­bu­tion with mean 0 and var­i­ance 1, and then I tell you 23,000 as the re­ported value. Via Bayes, you can see that it is more likely that the true value is 22,999 than 23,001. But the un­bi­ased es­ti­ma­tor blithely re­ports 23,000.

So, though the asym­me­try is do­ing some work here (the fur­ther we move above 0, the more likely that +1 rather than −1 is do­ing some of the work), it could still be that 23,000 is the small­est of the val­ues I sam­pled.

• 18 Nov 2018 16:12 UTC
4 points