• Re ad­dic­tive­ness: a po­ten­tial fix could be to add an op­tion to only re­fresh the recom­mended archive posts once per day (or some other time pe­riod of your choice).

• Thanks a lot for this Ruby! After skim­ming, the only thing I can think of adding would be a link to the mod­er­a­tion log, along with a short ex­pla­na­tion of what it records. Partly be­cause it’s good that peo­ple can look at it, and partly be­cause it’s nice to in­form peo­ple that their dele­tions and bans are pub­li­cly visi­ble.

• If the Uni­verse is in­finite, ev­ery pos­i­tive ex­pe­rience is already in­stan­ti­ated once. This view could then im­ply that you should only fo­cus on pre­vent­ing suffer­ing. That de­pends some­what on ex­actly what you mean with “I” and “we”, though, and if you think that the bound­ary be­tween our light­cone and the rest of the Uni­verse has any moral sig­nifi­cance.

• I don’t think any­one has claimed that “there’s a large fund­ing gap at cost-per-life-saved num­bers close to the cur­rent GiveWell es­ti­mates”, if “large” means $50B. GiveWell seem to think that their pre­sent top char­i­ties’ fund­ing gaps are in the tens of mil­lions. • Main point: I agree that in­ner al­ign­ment is a re­ally hard prob­lem, and that for a non-huge amount of train­ing data, there is likely to be a proxy goal that’s sim­pler than the real goal. De­scrip­tion length still seems im­por­tantly differ­ent from e.g. com­pu­ta­tion time. If we keep op­ti­mis­ing for the sim­plest learned al­gorithm, and grad­u­ally in­crease our train­ing data to­wards all of the data we care about, I ex­pect us to even­tu­ally reach a mesa-op­ti­miser op­ti­mis­ing for the base ob­jec­tive. (You seem to agree with this, in the last sec­tion?) How­ever, if we keep op­ti­mis­ing for the fastest learned al­gorithm, and grad­u­ally in­crease our train­ing data to­wards all of the data we care about, we won’t ever get a ro­bustly al­igned sys­tem (un­til we’ve shown it ev­ery sin­gle dat­a­point that we’ll ever care about). We’ll prob­a­bly just get a look-up table which acts ran­domly on new in­put. This differ­ence makes me think that sim­plic­ity could be a use­ful tool to make a ro­bustly al­igned mesa op­ti­miser. Maybe you dis­agree be­cause you think that the nec­es­sary amounts of data is so lu­dicrously big that we’ll never reach them, even by us­ing ad­ver­sar­ial train­ing or other such tricks? I’d be more will­ing to drop sim­plic­ity if we had good, generic meth­ods to di­rectly op­ti­mise for “pure similar­ity to the base ob­jec­tive”, but I don’t know how to do this with­out do­ing hard-coded op­ti­mi­sa­tion or in­ter­nals-based se­lec­tion. Maybe you think the task is im­pos­si­ble with­out some ver­sion of the lat­ter? • Minor point: as you men­tion, food, pain, mat­ing, etc. are pretty sim­ple to hu­mans, be­cause they get to re­fer to sen­sory data, but very com­plex from the per­spec­tive of evolu­tion, which doesn’t. I chose sta­tus and cheat­ing pre­cisely be­cause they don’t di­rectly re­fer to sim­ple sen­sory data. You need com­plex mod­els of your so­cial en­vi­ron­ment in or­der to even have a con­cept of sta­tus, and I ac­tu­ally think it’s pretty im­pres­sive that we have enough of such mod­els hard­coded into us to have prefer­ences over them. Since the origi­nal text men­tions food and pain as “di­rectly re­lated to our in­put data”, I thought sta­tus hi­er­ar­chies was no­tice­ably differ­ent from them, in this way. Do tell me if you were try­ing to point at some other dis­tinc­tion (or if you don’t think sta­tus re­quires com­plex mod­els). • Since there are more pseudo-al­igned mesa-ob­jec­tives than ro­bustly al­igned mesa-ob­jec­tives, pseudo-al­ign­ment pro­vides more de­grees of free­dom for choos­ing a par­tic­u­larly sim­ple mesa-ob­jec­tive. Thus, we ex­pect that in most cases there will be sev­eral pseudo-al­igned mesa-op­ti­miz­ers that are less com­plex than any ro­bustly al­igned mesa-op­ti­mizer. This isn’t ob­vi­ous to me. If the en­vi­ron­ment is fairly varied, you will prob­a­bly need differ­ent prox­ies for the base ob­jec­tive in differ­ent situ­a­tions. As you say, rep­re­sent­ing all these prox­ies di­rectly will save on com­pu­ta­tion time, but I would ex­pect it to have a longer de­scrip­tion length, since each proxie needs to be speci­fied in­de­pen­dently (to­gether with in­for­ma­tion on how to make trade­offs be­tween them). The op­po­site case, where a com­plex base ob­jec­tive cor­re­lates with the same proxie in a wide range of en­vi­ron­ments, seems rarer. Us­ing hu­mans as an anal­ogy, we were speci­fied with proxy goals, and our val­ues are ex­tremely com­pli­cated. You men­tion the sen­sory ex­pe­rience of food and pain as rel­a­tively sim­ple goals, but we also have far more com­plex ones, like the wish to be rel­a­tively high in a sta­tus hi­er­ar­chy, the wish to not have a mate cheat on us, etc. You’re right that an in­nate model of ge­netic fit­ness also would have been quite com­pli­cated, though. (Ro­hin men­tions that most of these things fol­low a pat­tern where one ex­treme en­courages heuris­tics and one ex­treme en­courages ro­bust mesa-op­ti­miz­ers, while you get pseudo-al­igned mesa-op­ti­miz­ers in the mid­dle. At pre­sent, sim­plic­ity breaks this pat­tern, since you claim that pseudo-al­igned mesa-op­ti­miz­ers are sim­pler than both heuris­tics and ro­bustly al­igned mesa-op­ti­miz­ers. What I’m say­ing is that I think that the gen­eral pat­tern might hold here, as well: short de­scrip­tion lengths might make it eas­ier to achieve ro­bust al­ign­ment.) Edit: To some ex­tent, it seems like you already agree with this, since Ad­ver­sar­ial train­ing points out that a suffi­ciently wide range of en­vi­ron­ments will have a ro­bustly al­igned agent as it’s sim­plest mesa-op­ti­mizer. Do you as­sume that there isn’t enough train­ing data to iden­tify , in Com­pres­sion of the mesa-op­ti­mizer? It might be good to clar­ify the differ­ence be­tween those two sec­tions. • Link to SSC’s ex­pla­na­tion of the con­cept. I’d say most po­si­tions are in be­tween com­plete con­flict the­ory and com­plete mis­take the­ory (though they’re not nec­es­sar­ily ‘tran­si­tional’, if peo­ple tend to stay there once they’ve reached them). It all de­pends on how much of poli­ti­cal dis­agree­ments you think is fueled by differ­ent in­ter­ests and how much is fueled by differ­ent be­liefs. I also think that the best po­si­tion lies there, some­where in be­tween. It is in fact cor­rect that a fair amount of poli­ti­cal con­flict hap­pens due to differ­ent in­ter­ests, so a com­plete mis­take the­o­rist would fre­quently fail to pre­dict why poli­tics works the way it does. (Of course, even if you agree with this, you may think that most peo­ple should be­come more mis­take the­o­rist, on the mar­gin.) • In the first chap­ter, it’s noted “The story has been cor­rected to Bri­tish English up to Ch. 17, and fur­ther Brit­pick­ing is cur­rently in progress (see the /​HPMOR sub­red­dit).”. Given your points, it seems like it’s not even thouroughly brit­picked up ’til 17. I ex­pect Eliezer to have writ­ten that note quite some time ago, so I’m not too hope­ful about this still go­ing on at the sub­red­dit, ei­ther. • If this is some­thing that ev­ery­one reads, it might be nice to provide links to more tech­ni­cal de­tails of the site. I imag­ine that some­one read­ing this who then en­gages with LW might won­der: • What makes a cu­rated post a cu­rated post? (this might fit into the site guide on per­sonal vs front­page posts) • Why do com­ments/​posts have more karma than votes? • What’s the map­ping be­tween users’ karma and vot­ing power? • How does edit­ing work? Some things are not im­me­di­ately ob­vi­ous, like: • How do I use la­tex? • How do I use foot­notes? • How do I cre­ate images? • How does mod­er­a­tion work? Who can mod­er­ate their own posts? This kind of knowl­edge isn’t gath­ered in one place right now, and is typ­i­cally difficult to google. • I’m scep­ti­cal that push­ing ego­ism over util­i­tar­i­anism will make peo­ple less prone to pun­ish oth­ers. I don’t know any sys­tem of util­i­tar­i­anism that places ter­mi­nal value on pun­ish­ing oth­ers, and (al­though there prob­a­bly ex­ists a few,) I don’t know of any­one who iden­ti­fies as a util­i­tar­ian who places ter­mi­nal value on pun­ish­ing oth­ers. In fact, I’d guess that the av­er­age per­son iden­ti­fy­ing as a util­i­tar­ian is less likely to pun­ish oth­ers (when there is no in­stru­men­tal value to be had) than the av­er­age per­son iden­ti­fy­ing as an ego­ist. After all, the ego­ist has no rea­son to tame their bar­baric im­pulses: if they want to pun­ish some­one, then it’s cor­rect to pun­ish that per­son. I agree that your ver­sion of ego­ism is similar to most ra­tio­nal­ists’ ver­sions of util­i­tar­i­anism (al­though there are definitely moral re­al­ist util­i­tar­i­ans out there). In­so­far as we have time to ex­plain our be­liefs prop­erly, the name we use for them (hope­fully) doesn’t mat­ter much, so we can call it ei­ther ego­ism or util­i­tar­i­anism. When we don’t have time to ex­plain our be­liefs prop­erly, though, the name does mat­ter, be­cause the listener will use their own in­ter­pre­ta­tion of it. Since I think that the av­er­age in­ter­pre­ta­tion of util­i­tar­i­anism is less likely to lead to pun­ish­ment than the av­er­age in­ter­pre­ta­tion of ego­ism, this doesn’t seem like a good rea­son to push for ego­ism. Maybe push­ing for moral anti-re­al­ism would be a bet­ter bet? • I still have no idea of how the to­tal amount of dy­ing peo­ple is rele­vant, but my best read­ing of your ar­gu­ment is: • If givewells cost effec­tive­ness es­ti­mates were cor­rect, foun­da­tions would spend their money on them. • Since the foun­da­tions have money that they aren’t spend­ing on them, the es­ti­mates must be in­cor­rect. Ac­cord­ing to this post, OpenPhil in­tends to spend rougly 10% of their money on “straight­for­ward char­ity” (rather than their other cause ar­eas). That would be about$1B (though I can’t find the ex­act num­bers right now), which is a lot, but hardly un­limited. Their wor­ries about dis­plac­ing other donors, cou­pled with the pos­si­bil­ity of learn­ing about bet­ter op­por­tu­ni­ties in the fu­ture, seems suffi­cient to jus­tify par­tial fund­ing to me.

That leaves the Gates Foun­da­tion (at least among the foun­da­tions that you men­tioned, of course there’s a lot more). I don’t have a good model of when re­ally big foun­da­tions does and doesn’t grant money, but I think Carl Shul­man makes some in­ter­est­ing points in this old thread.

• In gen­eral, I’d very much like a per­ma­nent neat-things-to-know-about-LW post or page, which re­ceives ed­its when there’s a sig­nifi­cant up­date (do tell me if there’s already some­thing like this). For ex­am­ple, I re­mem­ber try­ing to find in­for­ma­tion about the map­ping be­tween karma and vot­ing power a few months ago, and it was very difficult. I think I even­tu­ally found an an­nounce­ment post that had the an­swer, but I can’t know for sure, since there might have been a change since that an­nounce­ment was made. More re­cently, I saw that there were foot­notes in the se­quences, and failed to find any refer­ence what­so­ever on how to cre­ate foot­notes. I didn’t learn how to do this un­til a month or so later, when the foot­notes came to the EA fo­rum and aaron wrote a post about it.

• I’m con­fused about the ar­gu­ment you’re try­ing to make here (I also dis­agree with some things, but I want to un­der­stand the post prop­erly be­fore en­gag­ing with that). The main claims seem to be

There are sim­ply not enough ex­cess deaths for these claims to be plau­si­ble.

and, af­ter tel­ling us how many pre­ventable deaths there could be,

Either char­i­ties like the Gates Foun­da­tion and Good Ven­tures are hoard­ing money at the price of mil­lions of pre­ventable deaths, or the low cost-per-life-saved num­bers are wildly ex­ag­ger­ated.

But I don’t un­der­stand how these claims in­ter­con­nect. If there were more peo­ple dy­ing from pre­ventable dis­eases, how would that dis­solve the dilemma that the sec­ond claim poses?

Also, you say that $125 billion is well within the reach of the GF, but their web­site says that their pre­sent en­dow­ment is only$50.7 billion. Is this a mis­take, or do you mean some­thing else with “within reach”?

• Any rea­son why you men­tion time­less de­ci­sion the­ory (TDT) speci­fi­cally? My im­pres­sion was that func­tional de­ci­sion the­ory (as well as UDT, since they’re ba­si­cally the same thing) is re­garded as a strict im­prove­ment over TDT.

• Leech­block is ex­cel­lent. I presently use it to block face­book (ex­cept for events and perma­l­inks to spe­cific posts) all the time ex­cept for 10min be­tween 10pm and mid­night; I have a list of we­b­comics that I can only view on sat­ur­days; there is a web-based game that I can play once ev­ery sat­ur­day (where­after the ex­pired time pre­vents me from play­ing a sec­ond game), etc.

• Yes, these are among the rea­sons why moral value is not lin­early ad­di­tive. I agree.

I think the SSC post should only be con­strued as ar­gu­ing about the value of in­di­vi­d­ual an­i­mals’ ex­pe­riences, and that it in­ten­tion­ally ig­nores these other sources of val­ues. I agree with the SSC post that it’s use­ful to con­sider the value of in­di­vi­d­ual an­i­mals’ ex­pe­riences (what I would call their ‘moral weight’) in­de­pen­dently of the aes­thetic value and the op­tion value of the species that they be­long to. In­so­far as you agree that in­di­vi­d­ual an­i­mals’ ex­pe­riences add up lin­early, you don’t dis­agree with the post. In­so­far as you think that in­di­vi­d­ual an­i­mals’ ex­pe­riences add up sub-lin­early, I think you shouldn’t use species’ ex­tinc­tion as an ex­am­ple, since the aes­thetic value and the op­tion value are con­found­ing fac­tors.

Really? You con­sider it to be equiv­a­lently bad for there to be a plague that kills 100,000 hu­mans in a world with a pop­u­la­tion of 100,000 than in a world with a pop­u­la­tion of 7,000,000,000?

I con­sider it equally bad for the in­di­vi­d­ual, dy­ing hu­mans, which is what I meant when I said that I re­ject scope in­sen­si­tivity. How­ever, the former plague will pre­sum­ably elimi­nate the po­ten­tial for hu­man­ity hav­ing a long fu­ture, and that will be the most rele­vant con­sid­er­a­tion in the sce­nario. (This will prob­a­bly make the former sce­nario far worse, but you could add other de­tails to the sce­nario that re­versed that con­clu­sion.)