Re addictiveness: a potential fix could be to add an option to only refresh the recommended archive posts once per day (or some other time period of your choice).
Thanks a lot for this Ruby! After skimming, the only thing I can think of adding would be a link to the moderation log, along with a short explanation of what it records. Partly because it’s good that people can look at it, and partly because it’s nice to inform people that their deletions and bans are publicly visible.
LW footnotes can be created like EA forum footnotes.
If the Universe is infinite, every positive experience is already instantiated once. This view could then imply that you should only focus on preventing suffering. That depends somewhat on exactly what you mean with “I” and “we”, though, and if you think that the boundary between our lightcone and the rest of the Universe has any moral significance.
What do you think about the argument that the Universe might well be infinite, and if so, your view means that nothing we do matters, since every brainstate is already instantiated somewhere? (Taken from Bostrom’s paper on the subject.)
I don’t think anyone has claimed that “there’s a large funding gap at cost-per-life-saved numbers close to the current GiveWell estimates”, if “large” means $50B. GiveWell seem to think that their present top charities’ funding gaps are in the tens of millions.
I agree that inner alignment is a really hard problem, and that for a non-huge amount of training data, there is likely to be a proxy goal that’s simpler than the real goal. Description length still seems importantly different from e.g. computation time. If we keep optimising for the simplest learned algorithm, and gradually increase our training data towards all of the data we care about, I expect us to eventually reach a mesa-optimiser optimising for the base objective. (You seem to agree with this, in the last section?) However, if we keep optimising for the fastest learned algorithm, and gradually increase our training data towards all of the data we care about, we won’t ever get a robustly aligned system (until we’ve shown it every single datapoint that we’ll ever care about). We’ll probably just get a look-up table which acts randomly on new input.
This difference makes me think that simplicity could be a useful tool to make a robustly aligned mesa optimiser. Maybe you disagree because you think that the necessary amounts of data is so ludicrously big that we’ll never reach them, even by using adversarial training or other such tricks?
I’d be more willing to drop simplicity if we had good, generic methods to directly optimise for “pure similarity to the base objective”, but I don’t know how to do this without doing hard-coded optimisation or internals-based selection. Maybe you think the task is impossible without some version of the latter?
as you mention, food, pain, mating, etc. are pretty simple to humans, because they get to refer to sensory data, but very complex from the perspective of evolution, which doesn’t.
I chose status and cheating precisely because they don’t directly refer to simple sensory data. You need complex models of your social environment in order to even have a concept of status, and I actually think it’s pretty impressive that we have enough of such models hardcoded into us to have preferences over them.
Since the original text mentions food and pain as “directly related to our input data”, I thought status hierarchies was noticeably different from them, in this way. Do tell me if you were trying to point at some other distinction (or if you don’t think status requires complex models).
Since there are more pseudo-aligned mesa-objectives than robustly aligned mesa-objectives, pseudo-alignment provides more degrees of freedom for choosing a particularly simple mesa-objective. Thus, we expect that in most cases there will be several pseudo-aligned mesa-optimizers that are less complex than any robustly aligned mesa-optimizer.
This isn’t obvious to me. If the environment is fairly varied, you will probably need different proxies for the base objective in different situations. As you say, representing all these proxies directly will save on computation time, but I would expect it to have a longer description length, since each proxie needs to be specified independently (together with information on how to make tradeoffs between them). The opposite case, where a complex base objective correlates with the same proxie in a wide range of environments, seems rarer.
Using humans as an analogy, we were specified with proxy goals, and our values are extremely complicated. You mention the sensory experience of food and pain as relatively simple goals, but we also have far more complex ones, like the wish to be relatively high in a status hierarchy, the wish to not have a mate cheat on us, etc. You’re right that an innate model of genetic fitness also would have been quite complicated, though.
(Rohin mentions that most of these things follow a pattern where one extreme encourages heuristics and one extreme encourages robust mesa-optimizers, while you get pseudo-aligned mesa-optimizers in the middle. At present, simplicity breaks this pattern, since you claim that pseudo-aligned mesa-optimizers are simpler than both heuristics and robustly aligned mesa-optimizers. What I’m saying is that I think that the general pattern might hold here, as well: short description lengths might make it easier to achieve robust alignment.)
Edit: To some extent, it seems like you already agree with this, since Adversarial training points out that a sufficiently wide range of environments will have a robustly aligned agent as it’s simplest mesa-optimizer. Do you assume that there isn’t enough training data to identify Obase, in Compression of the mesa-optimizer? It might be good to clarify the difference between those two sections.
This version, HPMOR.com, fanfiction.net, and the pdf all have “honor” (without a u) at two places in chapter 7. I don’t think there is a more updated version.
Link to SSC’s explanation of the concept.
I’d say most positions are in between complete conflict theory and complete mistake theory (though they’re not necessarily ‘transitional’, if people tend to stay there once they’ve reached them). It all depends on how much of political disagreements you think is fueled by different interests and how much is fueled by different beliefs. I also think that the best position lies there, somewhere in between. It is in fact correct that a fair amount of political conflict happens due to different interests, so a complete mistake theorist would frequently fail to predict why politics works the way it does.
(Of course, even if you agree with this, you may think that most people should become more mistake theorist, on the margin.)
In the first chapter, it’s noted “The story has been corrected to British English up to Ch. 17, and further Britpicking is currently in progress (see the /HPMOR subreddit).”. Given your points, it seems like it’s not even thouroughly britpicked up ’til 17. I expect Eliezer to have written that note quite some time ago, so I’m not too hopeful about this still going on at the subreddit, either.
If this is something that everyone reads, it might be nice to provide links to more technical details of the site. I imagine that someone reading this who then engages with LW might wonder:
What makes a curated post a curated post? (this might fit into the site guide on personal vs frontpage posts)
Why do comments/posts have more karma than votes?
What’s the mapping between users’ karma and voting power?
How does editing work? Some things are not immediately obvious, like:
How do I use latex?
How do I use footnotes?
How do I create images?
How does moderation work? Who can moderate their own posts?
This kind of knowledge isn’t gathered in one place right now, and is typically difficult to google.
I’m sceptical that pushing egoism over utilitarianism will make people less prone to punish others.
I don’t know any system of utilitarianism that places terminal value on punishing others, and (although there probably exists a few,) I don’t know of anyone who identifies as a utilitarian who places terminal value on punishing others. In fact, I’d guess that the average person identifying as a utilitarian is less likely to punish others (when there is no instrumental value to be had) than the average person identifying as an egoist. After all, the egoist has no reason to tame their barbaric impulses: if they want to punish someone, then it’s correct to punish that person.
I agree that your version of egoism is similar to most rationalists’ versions of utilitarianism (although there are definitely moral realist utilitarians out there). Insofar as we have time to explain our beliefs properly, the name we use for them (hopefully) doesn’t matter much, so we can call it either egoism or utilitarianism. When we don’t have time to explain our beliefs properly, though, the name does matter, because the listener will use their own interpretation of it. Since I think that the average interpretation of utilitarianism is less likely to lead to punishment than the average interpretation of egoism, this doesn’t seem like a good reason to push for egoism.
Maybe pushing for moral anti-realism would be a better bet?
I still have no idea of how the total amount of dying people is relevant, but my best reading of your argument is:
If givewells cost effectiveness estimates were correct, foundations would spend their money on them.
Since the foundations have money that they aren’t spending on them, the estimates must be incorrect.
According to this post, OpenPhil intends to spend rougly 10% of their money on “straightforward charity” (rather than their other cause areas). That would be about $1B (though I can’t find the exact numbers right now), which is a lot, but hardly unlimited. Their worries about displacing other donors, coupled with the possibility of learning about better opportunities in the future, seems sufficient to justify partial funding to me.
That leaves the Gates Foundation (at least among the foundations that you mentioned, of course there’s a lot more). I don’t have a good model of when really big foundations does and doesn’t grant money, but I think Carl Shulman makes some interesting points in this old thread.
In general, I’d very much like a permanent neat-things-to-know-about-LW post or page, which receives edits when there’s a significant update (do tell me if there’s already something like this). For example, I remember trying to find information about the mapping between karma and voting power a few months ago, and it was very difficult. I think I eventually found an announcement post that had the answer, but I can’t know for sure, since there might have been a change since that announcement was made. More recently, I saw that there were footnotes in the sequences, and failed to find any reference whatsoever on how to create footnotes. I didn’t learn how to do this until a month or so later, when the footnotes came to the EA forum and aaron wrote a post about it.
I’m confused about the argument you’re trying to make here (I also disagree with some things, but I want to understand the post properly before engaging with that). The main claims seem to be
There are simply not enough excess deaths for these claims to be plausible.
and, after telling us how many preventable deaths there could be,
Either charities like the Gates Foundation and Good Ventures are hoarding money at the price of millions of preventable deaths, or the low cost-per-life-saved numbers are wildly exaggerated.
But I don’t understand how these claims interconnect. If there were more people dying from preventable diseases, how would that dissolve the dilemma that the second claim poses?
Also, you say that $125 billion is well within the reach of the GF, but their website says that their present endowment is only $50.7 billion. Is this a mistake, or do you mean something else with “within reach”?
Any reason why you mention timeless decision theory (TDT) specifically? My impression was that functional decision theory (as well as UDT, since they’re basically the same thing) is regarded as a strict improvement over TDT.
Leechblock is excellent. I presently use it to block facebook (except for events and permalinks to specific posts) all the time except for 10min between 10pm and midnight; I have a list of webcomics that I can only view on saturdays; there is a web-based game that I can play once every saturday (whereafter the expired time prevents me from playing a second game), etc.
Yes, these are among the reasons why moral value is not linearly additive. I agree.
I think the SSC post should only be construed as arguing about the value of individual animals’ experiences, and that it intentionally ignores these other sources of values. I agree with the SSC post that it’s useful to consider the value of individual animals’ experiences (what I would call their ‘moral weight’) independently of the aesthetic value and the option value of the species that they belong to. Insofar as you agree that individual animals’ experiences add up linearly, you don’t disagree with the post. Insofar as you think that individual animals’ experiences add up sub-linearly, I think you shouldn’t use species’ extinction as an example, since the aesthetic value and the option value are confounding factors.
Really? You consider it to be equivalently bad for there to be a plague that kills 100,000 humans in a world with a population of 100,000 than in a world with a population of 7,000,000,000?
I consider it equally bad for the individual, dying humans, which is what I meant when I said that I reject scope insensitivity. However, the former plague will presumably eliminate the potential for humanity having a long future, and that will be the most relevant consideration in the scenario. (This will probably make the former scenario far worse, but you could add other details to the scenario that reversed that conclusion.)