# BurntVictory

Karma: 45
• The Kitty Gen­ovese Equation

Some­one’s in trou­ble. You can hear them from your apart­ment, but you can’t tell if any of your neigh­bors are already rush­ing down, or already call­ing the po­lice. It’s time sen­si­tive, and you’ve got to de­cide now: is it worth spend­ing those pre­cious min­utes, or not?

Let’s define our vari­ables:

Cost to vic­tim of no­body helping:

cost to each by­stan­der of in­ter­ven­ing:

Num­ber of by­stan­ders: (Since , for it’s always right to in­ter­vene.)

Anal­y­sis:

Sup­pose the by­stan­ders all si­mul­ta­neously de­cide whether to in­ter­vene or not, with prob­a­bil­ity p. Then ex­pected world-util­ity is

Utility is max­i­mized when ; In other words, when . Let . Then we have the op­ti­mal prob­a­bil­ity of not helping, .

One in­ter­est­ing im­pli­ca­tion of our solu­tion is that the prob­a­bil­ity that the vic­tim isn’t helped, , equals . Since , this means P(not helped) starts small at for and rapidly rises to .

Ex­am­ples:

Sup­pose in­ter­ven­ing would cost a minute, and the vic­tim would live 2 years longer on av­er­age if you in­ter­vened. Then is about one in a mil­lion, . Once you get to seven by­stan­ders, it’s op­ti­mal to not in­ter­vene 10% of the time. is about a mil­lion, so with 21 by­stan­ders it’s op­ti­mal for each to take a 50-50 shot at helping.

If is a mere , you get there six times as fast: a 10% chance to not help at N=2, 50% around N = 4-5, and a whop­ping 75% chance around N=9.

Ap­pli­ca­tion:

This was in­spired by friends’ varied will­ing­ness to in­ter­vene in pub­lic dis­putes, and my own ex­pe­rience wor­ry­ing about how to re­spond to po­ten­tial crises around me. Of course, in real life we have a lot of un­cer­tainty around and around other peo­ple’s , and we can of­ten wait and ob­serve if some­one goes to help. For situ­a­tions where de­ci­sions are pretty si­mul­ta­neous, though, it would be in­ter­est­ing to see how well peo­ple’s re­sponses line up with the curve.

# Burn­tVic­tory’s Shortform

24 Aug 2019 14:11 UTC
2 points
• The LessWrongy frame­work I’m fa­mil­iar with would say that value = ex­pected util­ity, so it takes po­ten­tial down­sides into ac­count. You’re not risk-averse wrt your VNM util­ity func­tion, but com­put­ing that util­ity func­tion is hard in prac­tice, and EV calcu­la­tions can benefit from some con­sid­er­a­tion of the tail-risks.

• Schel­ling’s The Strat­egy of Con­flict seems very rele­vant here; a ma­jor fo­cus is pre­com­mit­ment as a bar­gain­ing tool. See here for an old re­view by cousin_it.

Iter­ated chicken seems fine to test, just as a spinoff of the IPD that maps to slightly differ­ent situ­a­tions. (I be­lieve that the iter­ated game of mu­tu­ally mod­el­ing each other’s sin­gle-shot strat­egy is differ­ent from iter­at­ing the game it­self, so I don’t think Abram’s post nec­es­sar­ily im­plies that iter­ated chicken is rele­vant to ASI black­mail solu­tions.)

Speak­ing of iter­ated games, one nat­u­ral form of black­mail is for the black­mailee to pay an in­come stream to the black­mailer; that way, at each time-step they’re pay­ing their fair price for the good of [not hav­ing their se­cret re­vealed be­tween time t and time t+1]. Here’s a well-cited pa­per that dis­cusses this idea in the con­text of nu­clear brinks­man­ship: Sch­warz & Sonin 2007.

• It’s true the net effect is low to first or­der, but you’re ne­glect­ing sec­ond-or­der effects. If pre­mia are im­por­tant enough, peo­ple will feel com­pel­led to Good­hart prox­ies used for them un­til those prox­ies have less mean­ing.

Given the linked siderea post, maybe this is not very true for in­surance in par­tic­u­lar. I agree that wasn’t a great ex­am­ple.

Slack-wise, uh, choices are bad. re­ally bad. Keep the sab­bath. Th­ese are some in­tu­itions I sus­pect are at play here. I’m not in­ter­ested in a de­tailed ar­gu­ment hash­ing out whether we should be­lieve that these out­weigh other fac­tors in prac­tice across what­ever range of sce­nar­ios, be­cause it seems like it would take a lot of time/​effort for me to ac­tu­ally build good mod­els here, and op­por­tu­nity costs are a thing. I just want to point out that these ideas seem rele­vant for cor­rectly in­ter­pret­ing Zvi’s po­si­tion.

• The post im­plies it is bad to be judged. I could have mis­in­ter­preted why, but that im­pli­ca­tion is there. If judge just meant “make in­fer­ences about” why would it be bad?

As Rae­mon says, know­ing that oth­ers are mak­ing cor­rect in­fer­ences about your be­hav­ior means you can’t re­lax. No, idk, watch­ing soap op­eras, be­cause that’s an in­di­ca­tor of be­ing less likely to re­pay your loans, and your pre­mia go up. There’s an ethos of slack, de­ci­sion­mak­ing-has-costs, strate­giz­ing-has-costs that Zvi’s ex­plored in his pre­vi­ous posts, and that’s part of how I’m in­ter­pret­ing what he’s say­ing here.

But it also helps in know­ing who’s ex­ploit­ing them! Why does it give more ad­van­tages to the “bad” side?
Sure, but doesn’t it help me against them too?

You don’t want to spend your pre­cious time on black­mailing ran­dom jerks, prob­a­bly. So at best, now some of your in­come goes to­ward pay­ing a white-hat black­mailer to fend off the black-hats. (Un­clear what the mar­ket for that looks like. Also, black-hat­ters can af­ford to spe­cial­ize in un­black­maila­bil­ity; it comes up much more of­ten for them than the av­er­age per­son.) You’re right, though, that it’s pos­si­ble to have an equil­ibrium where de­ter­rence dom­i­nates and the black-hat­ting in­cen­tives are low, in which case maybe the white-hat fees are low and now you have a white-hat de­ter­rent. So this isn’t strictly bad, though my in­stinct is that it’s bad in most plau­si­ble cases.

Why would you ex­pect the ter­ror­ists to be mis­cal­ibrated about this be­fore the re­duc­tion in pri­vacy, to the point where they think peo­ple won’t ne­go­ti­ate with them when they ac­tu­ally will, and less pri­vacy pre­dictably changes this opinion?

That’s a fair point! A cou­ple of coun­ter­points: I think risk-aver­sion of ‘ter­ror­ists’ helps. There’s also a point about sec­ond-or­der effects again; the eas­ier it is to black­mail/​ex­tort/​etc., the more peo­ple can af­ford to spe­cial­ize in it and reap economies of scale.

Per­haps the op­ti­mal set of norms for these peo­ple is “there are no rules, do what you want”. If you can im­prove on that, than that would con­sti­tute a norm-set that is more just than norm­less­ness. Cap­tur­ing true eth­i­cal law in the norms most peo­ple fol­low isn’t nec­es­sary.

Eh, sure. My guess is that Zvi is mak­ing a state­ment about norms as they are likely to ex­ist in hu­man so­cieties with some level of in­tu­itive-similar­ity to our own. I think the use­ful ques­tion here is like “is it pos­si­ble to in­stan­ti­ate norms s.t. norm-vi­o­la­tions are ~all eth­i­cal-vi­o­la­tions”. (we’re still dis­cussing the value of less pri­vacy/​more black­mail, right?) No-rule or few-rule com­mu­ni­ties could work for this, but I ex­pect it to be pretty hard to in­stan­ti­ate them at large scale. So sure, this does mean you could maybe build a small lo­cal com­mu­nity where black­mail is easy. That’s even kind of just what so­cial groups are, as Zvi notes; places where you can share sen­si­tive info be­cause you won’t be judged much, nor at­tacked as a norm-vi­o­la­tor. Hav­ing that work at su­per-Dun­bar level seems tough.

• I found this pretty use­ful—Zvi’s definitely re­flect­ing a par­tic­u­lar, pretty nega­tive view of so­ciety and strat­egy here. But I dis­agree with some of your in­fer­ences, and I think you’re some­what ex­ag­ger­at­ing the level of gloom-and-doom im­plicit in the post.

>Im­pli­ca­tion: “judge” means to use in­for­ma­tion against some­one. Lin­guis­tic norms re­lated to the word “judg­ment” are thor­oughly cor­rupt enough that it’s worth ced­ing to these, lin­guis­ti­cally, and us­ing “judge” to mean (usu­ally un­justly!) us­ing in­for­ma­tion against peo­ple.

No, this isn’t bare rep­e­ti­tion. I agree with Rae­mon that “judge” here means some­thing closer to one of its stan­dard us­ages, “to make in­fer­ences about”. Though it also fits with the col­lo­quial “deem un­wor­thy for bar­ing [un­der­stand­able] flaws”, which is also a thing that would hap­pen with black­mail and could be bad.

>Im­pli­ca­tion: more gen­er­ally available in­for­ma­tion about what strate­gies peo­ple are us­ing helps “our” en­e­mies more than it helps “us”. (This seems false to me, for no­tions of “us” that I usu­ally use in strat­egy)

I can imag­ine a cou­ple things go­ing on here? One, if the world is a place where may more vuln­er­a­bil­ities are more known, this in­cen­tivizes more peo­ple to spe­cial­ize in ex­ploit­ing those vuln­er­a­bil­ities. Two, as a flawed hu­man there are prob­a­bly some stres­sors against which you can’t cred­ibly play the “won’t ne­go­ti­ate with ter­ror­ists” card.

>Im­pli­ca­tion: even in the most just pos­si­ble sys­tem of norms, it would be good to some­times vi­o­late those norms and hide the fact that you vi­o­lated them. (This seems in­cor­rect to me!)

I think the as­sump­tion is these are ~baseline hu­mans we’re talk­ing about, and most hu­man brains can’t hold norms of suffi­cient so­phis­ti­ca­tion to cap­ture true eth­i­cal law, and are also bi­ased in ways that will some­times strain against re­flec­tively-en­dorsed ethics (e.g. they’re prone to us­ing con­strained cir­cles of moral con­cern rather than uni­ver­sal­ity).

>Im­pli­ca­tion: the bad guys won; we have rule by gang­sters, who aren’t con­cerned with sus­tain­able pro­duc­tion, and just take as much stuff as pos­si­ble in the short term. (This seems on the right track but par­tially false; the top marginal tax rate isn’t 100%)

This part of the post re­minded me of (the SSC re­view of) See­ing Like a State, which makes a similar point; sur­vey­ing and ‘ra­tio­nal­iz­ing’ farm­land, tak­ing a cen­sus, etc. = leg­i­bil­ity = tax­a­bil­ity. “all of them” does seem like hy­per­bole here. I guess you can imag­ine the max­i­mally in­con­ve­nient case where mo­ti­vated peo­ple with low cost of time and few com­punc­tions know your re­sources and full util­ity func­tion, and can pro­ceed to ex­tract ~all liquid value from you.

• The CHAI read­ing list is also fairly out of date (last up­dated april 2017) but has a few more pa­pers, es­pe­cially if you go to the top and se­lect [3] or [4] so it shows lower-pri­or­ity ones.

(And in case oth­ers haven’t seen it, here’s the MIRI read­ing guide for learn­ing agent foun­da­tions.)

• Oh wait, yeah, this is just an ex­am­ple of the gen­eral prin­ci­ple “when you’re op­ti­miz­ing for xy, and you have a limited bud­get with lin­ear costs on x and y, the op­ti­mal al­lo­ca­tion is to spend equal amounts on both.”

For­mally, you can show this via La­grange-mul­ti­plier op­ti­miza­tion, us­ing the La­grangian . Set­ting the par­tials equal to zero gets you , and you re­cover the lin­ear con­straint func­tion . So . (Alter­na­tively, just op­ti­miz­ing works, but I like La­grange mul­ti­pli­ers.)

In this case, we want to max­i­mize , which is equiv­a­lent to op­ti­miz­ing . Let’s define , so we’re op­ti­miz­ing .

Our con­straint func­tion is defined by the trade­off be­tween and . , so . , so .

Rear­rang­ing gives the con­straint func­tion . This is in­deed lin­ear, with a to­tal ‘bud­get’ of .5 and a p-co­effi­cient of 1. So by the above the­o­rem we should have .

• I think your solu­tion to “reck­less ri­vals” might be wrong? I think you mis­tak­enly put a mul­ti­plier of q in­stead of a p on the left-hand side of the in­equal­ity. (The deriva­tion of the gen­eral in­equal­ity checks out, though, and I like your point about dis­con­tin­u­ous effects of ca­pac­ity in­vest­ment when you as­sume that the op­po­nent plays a known pure strat­egy.)

I’ll use slightly differ­ent no­ta­tion from yours, to avoid over­load­ing p and q. (This ends up not mat­ter­ing be­cause of lin­ear­ity, but eh.) Let be the ini­tial prob­a­bil­ities for win­ning and safety|win­ning. Let be the ca­pac­ity vari­able, and with­out loss of gen­er­al­ity let start at and end at . Then and . So , so . And , so .

There­fore, the left-hand side of the in­equal­ity, , equals . At the ini­tial point , this sim­plifies to .

Let’s as­sume . The rel­a­tive safety of the other pro­ject is , which at sim­plifies to .

Thus we should com­mit more to ca­pac­ity when , or , or . This is a lit­tle weird, but makes a bit more in­tu­itive sense to me than or mat­ter­ing.

• Yeah, I worry that com­pet­i­tive pres­sure could con­vince peo­ple to push for un­safe sys­tems. Mili­tary AI seems like an es­pe­cially risky case. Mili­tary goals are harder to spec­ify than “max­i­mize port­fo­lio value”, but there are prob­a­bly rea­son­able prox­ies, and as AI gets more ca­pa­ble and more widely used there’s a strong in­cen­tive to get ahead of the com­pe­ti­tion.

• Yeah, I think you’re right.* So it ac­tu­ally looks the same as the “TFTWF ac­ci­den­tally defects” case.

*as­sum­ing we spec­ify TFTWF as “defect against DD, co­op­er­ate oth­er­wise”. I don’t see a rea­son­able al­ter­nate defi­ni­tion. I think you’re right that defect­ing against DC is bad, and if we go to 3-mem­ory, defect­ing against DDC while co­op­er­at­ing with DCD seems bad too.** Sarah can’t be as­sum­ing the lat­ter, any­way, be­cause the “TFTWF ac­ci­den­tally defects” case would look differ­ent.

**there might be some fairly rea­son­ably-be­haved var­i­ant that’s like “defect if >=2 of 3 past moves were D”, but that seems like a) prob­a­bly bad since I just made it up and b) not what’s be­ing dis­cussed here.

• I liked the playful writ­ing here.

Maybe I’m be­ing dumb, but I feel like spel­ling out some of your ideas would have been use­ful. (Or maybe you’re just play­ing with ~pre-rigor in­tu­itions, and I’m over­think­ing this.)

I think “float to the top” could plau­si­bly mean:

A. In prac­tice, hu­man na­ture bi­ases us to­wards treat­ing these ideas as if they were true.

B. Ideal rea­son­ing im­plies that these ideas should be treated as if they were true.

C. By pos­tu­late, these ideas end up reach­ing fix­a­tion in so­ciety. [Which then im­plies things about what mem­bers of so­ciety can and can’t rec­og­nize, e.g. the ex­is­tence of AIXI-like ac­tors.]

Like­wise, what level do you want a NAT to be im­ple­mented at? Per­sonal be­hav­ior? Struc­ture of group blog sites? So­cial norms?

• I’ll echo the other com­menters in say­ing this was in­ter­est­ing and valuable, but also (per­haps nec­es­sar­ily) left me to cross some sig­nifi­cant in­fer­en­tial gaps. The biggest for me were in go­ing from game-de­scrip­tions to equil­ibria. Maybe this is just a thing that can’t be made in­tu­itive to peo­ple who haven’t solved it out? But I think that, e.g., graphs of the kinds of dis­tri­bu­tions you get in differ­ent cases would have helped me, at least.

I also had to think for a bit about what as­sump­tions you were mak­ing here:

A more rigor­ous or multi-step pro­cess could have only done so much. To get bet­ter in­for­ma­tion, they would have had to add a differ­ent kind of test. That would risk in­tro­duc­ing bad noise.

A very naive model says ad­di­tional tests → un­cor­re­lated noise → less noise in the av­er­age.

More re­al­is­ti­cally, we can as­sume that some di­men­sions of qual­ity are eas­ier to Good­hart than oth­ers, and you don’t know which are which be­fore­hand. But then, how do you know your ini­tial choice of test isn’t Good­hart-y? And even if the Good­hart noise is much larger than the true vari­a­tion in skill, it seems like you can ag­gre­gate scores in a way that would al­low you to make use of the in­for­ma­tion from the differ­ent tests with­out be­ing bam­boo­zled. (Depend­ing on your use-case, you could take the av­er­age of a con­cave func­tion of the scores, or use quan­tiles, or take the min score, etc.)

In re­al­ity, though, you usu­ally have some idea what di­men­sions are im­por­tant for the job. Maybe it’s some­thing like PCA, with the noise/​sig­nal ra­tio of di­men­sions de­creas­ing as you go down the list of com­po­nents. Then that de­crease, plus marginal costs of more tests, means that there is some nat­u­ral stop­ping point. I guess that makes sense, but it took a bit for me to get there. Is that what you were think­ing?

• A similar con­cept is the idea of offense-defense bal­ance in in­ter­na­tional re­la­tions. eg, large stock­piles of nu­clear weapons strongly fa­vor “defense” (well, de­ter­rence) be­cause it’s pro­hibitively costly to de­velop the ca­pac­ity to re­li­ably de­stroy the en­emy’s sec­ond-strike forces. Note the caveats there: at suffi­cient re­source lev­els, and given con­straints im­posed by other tech­nolo­gies (eg in­abil­ity to de­tect nu­clear subs).

Allan Dafoe and Ben Garfinkel have a pa­per out on how techs tend to fa­vor offense at low in­vest­ment and defense at high in­vest­ment. (That is, the re­source ra­tio R at which an at­tacker with re­sources RD has an X% chance of defeat­ing a defen­der with re­sources D tends to de­crease with D up to a lo­cal max­i­mum, then in­crease.)

• Well, it’s nonequil­ibrium, so pres­sure isn’t even at each layer of wa­ter any more...

When I pic­ture this hap­pen­ing, there’s a pulse of high-pres­sure wa­ter be­low the rock. If you froze the rock’s mo­tion while keep­ing its force on the wa­ter be­low it, I think the pulse would even­tu­ally equil­ibrate out of ex­is­tence as wa­ter flowed to the side? Or if I imag­ine a fluid with strong drag forces on the rock, but which flows smoothly it­self, it again seems plau­si­ble that the pres­sure equil­ibrates at the bot­tom.

(More con­fi­dent in the first para than the sec­ond one.)

• Hey, no­ticed what might be er­rors in your le­sion chart: No le­sion, no can­cer should give +1m utils in both cases. And your prob­a­bil­ities don’t add to 1. In­clud­ing p(le­sion) ex­plic­itly doesn’t mean­ingfully change the EV differ­ence, so eh. How­ever, my un­der­stand­ing is that the core of the le­sion prob­lem is rec­og­niz­ing that p(le­sion) is in­de­pen­dent of smok­ing; EYNS seems to say the same. Might be worth in­clud­ing it to make that clearer?

(I don’t know much about de­ci­sion the­ory, so maybe I’m just con­fused.)

• I think what av­turchin is get­ting at is that when you say “there is a 13 chance your mem­ory is false and a 13 chance you are the origi­nal”, you’re im­plic­itly con­di­tion­ing only on “be­ing one of the N to­tal clones”, ig­nor­ing the ex­tra in­for­ma­tion “do you re­mem­ber the last split” which pro­vides a lot of use­ful in­for­ma­tion. That is, if each clone fully con­di­tioned on the in­for­ma­tion available to them, you’d get 0-.5-.5 as sub­jec­tive prob­a­bil­ities due to your step 2.

If that’s not what you’re go­ing for, it seems like maybe the prob­a­bil­ity you’re calcu­lat­ing is “prob­a­bil­ity that, given you’re ran­domly (uniformly) as­signed to be one of the N peo­ple, you’re the origi­nal”. But then that’s ob­vi­ously 1/​N re­gard­less of mem­ory shenani­gans.

If you think this is not what you’re say­ing, then I’m con­fused.

• The idea of re­duc­ing hy­pothe­ses to bit­strings (ie, pro­grams to be run on a uni­ver­sal Tur­ing ma­chine) ac­tu­ally helped me a lot in un­der­stand­ing some­thing about sci­ence that hindis­ght had pre­vi­ously cheap­ened for me. Look­ing back on the found­ing of quan­tum me­chan­ics, it’s easy to say “right, they should have aban­doned their idea of par­ti­cles ex­ist­ing as point ob­jects with definite po­si­tion and adopted the con­cept and lan­guage of prob­a­bil­ity dis­tri­bu­tions, rather than as­sum­ing a par­ti­cle re­ally ex­ists and is just ‘hid­den’ by the wave­func­tion.” But the sci­en­tists of the day had a pro­gram­ming lan­guage in their heads where “par­ti­cle” was a ba­sic ob­ject and prob­a­bil­ity was some­thing com­pli­cated that you had to build up—the op­ti­miza­tion pro­cess of sci­ence had ar­rived at a lo­cal max­i­mum in the land­scape of pos­si­ble lan­guages to de­scribe the world.

I re­al­ize this is a pretty sim­ple in­sight, but I’m glad the ar­ti­cle gave me a way to bet­ter un­der­stand this.