# Rafael Harth

Karma: 2,671
• 4 Dec 2021 22:12 UTC
LW: 2 AF: 1
AF

I strongly believe that (1) well-being is objective, (2) well-being is quantifiable, and (3) Open Individualism is true (i.e., the concept of identity isn’t well-defined, and you’re subjectively no less continuous with the future self if any other person than your own future self).

If (1-3) are all true, then utilitronium is the optimal outcome for everyone even if they’re entirely selfish. Furthermore, I expect an AGI to figure this out, and to the extent that it’s aligned, it should communicate that if it’s asked. (I don’t think an AGI will therefore decide to do the right thing, so this is entirely compatible with everyone dying if alignment isn’t solved.)

In the scenario where people get to talk to the AGI freely and it’s aligned, two concrete mechanisms I see are (a) people just ask the AGI what is morally correct and it tells them, and (b) they get some small taste of what utilitronium would feel like, which would make it less scary. (A crucial piece is that they can rationally expect to experience this themselves in the utilitronium future.)

In the scenario where people don’t get to talk to the AGI, who knows. It’s certainly possible that we have singleton scenario with a few people in charge of the AGI, and they decide to censor questions about ethics because they find the answers scary.

The only org I know of that works on this and shares my philosophical views is QRI. Their goal is to (a) come up with a mathematical space (probably a topological one, mb a Hilbert space) that precisely describes the subjective experience of someone, (b) find a way to put someone in the scanner and create that space, and (c) find a property of that space that corresponds to their well-being in that moment. The flag ship theory is that this property is symmetry. Their model is stronger than (1-3), but if it’s correct, you could get hard evidence on this before AGI since it would make strong testable predictions about people’s well-being (and they think it could also point to easy interventions, though I don’t understand how that works). Whether it’s feasible to do this before AGI is a different question. I’d bet against it, but I think I give it better odds than any specific alignment proposal. (And I happen to know that Mike agrees that the future is dominated by concerns about AI and thinks this is the best thing to work on.)

So, I think their research is the best bet for getting more people on board with utilitronium since it can provide evidence on (1) and (2). (Also has the nice property that it won’t work if (1) or (2) are false, so there’s low risk of outrage.) Other than that, write posts arguing for moral realism and/​or for Open Individualism.

Quantifying suffering before AGI would also plausibly help with alignment, since at least you can formally specify a broad space of outcomes you don’t want. though it certainly doesn’t solve it, e.g. because of inner optimizers.

• 4 Dec 2021 19:31 UTC
LW: 2 AF: 1
AF

I don’t have any reason why this couldn’t happen. My position is something like “morality is real, probably precisely quantifiable; seems plausible that in the scenario of humans with autonomy and aligned AI, this could lead to an asymmetry where more people tend toward utilitronium over time”. (Hence why I replied, you didn’t seem to consider that possibility.) I could make up some mechanisms for this, but probably you don’t need me for that. Also seems plausible that this doesn’t happen. If it doesn’t happen, maybe the people who get to decide what happens with the rest of the universe tend toward utilitronium. But my model is widely uncertain and doesn’t rule out futures of highly suboptimal personal utopias that persist indefinitely.

• 4 Dec 2021 18:30 UTC
LW: 4 AF: 2
AF

This comment seems to be consistent with the assumption that the outcome 1 year after the singularity is locked in forever. But the future we’re discussing here is one where humans retain autonomy (?), and in that case, they’re allowed to change their mind over time, especially if humanity has access to a superintelligent aligned AI. I think a future where we begin with highly suboptimal personal utopias and gradually transition into utilitronium is among the more plausible outcomes. Compared with other outcomes where Not Everyone Dies, anyway. Your credence may differ if you’re a moral relativist.

• 1: To me, it made it more entertaining and thus easier to read. (No idea about non-anecdotal data, would also be interested.)

3: Also no data; I strongly suspect the metric is generally good because… actually I think it’s just because the people I find worth listening to are overwhelmingly not condescending. This post seems highly usual in several ways.

• Is Humbali right that generic uncertainty about maybe being wrong, without other extra premises, should increase the entropy of one’s probability distribution over AGI, thereby moving out its median further away in time?

My answer to this is that

First, no update whatsoever should take place because a probability distribution already expresses uncertainty, and there’s no mechanism by which the uncertainty increased. Adele Lopez independently (and earlier) came up with the same answer.

Second, if there were an update—say EY learned “one of the steps used in my model was wrong”—this should indeed change the distribution. However, it should change it toward the prior distribution. It’s completely unclear what the prior distribution is, but there is no rule whatsoever that says “more entropy = more prior-y” as shown by the fact that a uniform distribution over the next years has extremely high entropy yet makes a ludicrously confident prediction.

See also Information Charts (second chapter). Being under-confident/​losing confidence does not have to shift your probability toward the 50% mark; it shifts it toward the prior from whoever it was before, and the prior can be literally any probability. If it were universally held that AGI happens in 5 years, then this could be the prior, and updating downward on EY’s gears-level model would update the probability toward quicker timelines.

It does not. “go get your booster” is clearly short for “you should get your booster assuming you don’t already have one”. The negation of that isn’t “you already have a booster”, it’s “you should not get your booster even if you don’t already have one”. So the contrapositive of this statement is “if you shouldn’t get your booster even though you don’t already have one, then you shouldn’t be worried about the virus”.

(You can also see that something went wrong with the implication you’ve drawn, seeing as you’ve derived an unreasonable statement from a reasonable one. The contrapositve is logically equivalent, it can’t be less reasonable.)

• I was actually being serious.

But (d) seems like a rounding error given what’s at stake,[1] and (b) doesn’t seem like a negative. The Van Neumann comparison only seems relevant if we can clone dead people (which I assume we can’t?). So on the whole, he still seems like a great choice to me. But by all means, clone both him and paul.

1. As far as political feasibility goes, the hedonic level seems less important to me than consent. ↩︎

• Isn’t the guy who founded the AI alignment field the obvious person to clone? Especially given that he’s still alive.

• The total absence of obvious output of this kind from the rest of the “AI safety” field even in 2020 causes me to regard them as having less actual ability to think in even a shallowly adversarial security mindset, than I associate with savvier science fiction authors. Go read fantasy novels about demons and telepathy, if you want a better appreciation of the convergent incentives of agents facing mindreaders than the “AI safety” field outside myself is currently giving you.

While this this may be a fair criticism, I feel like someone ought to point out that the vast majority of AI safety output (at least that I see on LW) isn’t trying to do anything like “sketch a probability distribution over the dynamics of an AI project that is nearing AGI”. This includes all technical MIRI papers I’m familiar with.

Perhaps we should be doing this (though, isn’t that more for AI forecasting/​strategy rather than alignment? Of course still AI safety), but then the failure isn’t “no-one has enough security mindset” but rather something like “no-one has the social courage to tackle the problems that are actually important”. (This would be more similar to EY’s critique in the Discussion on AGI interventions post.)

• This phenomenological account of frame control doesn’t provide a causal model precise enough for me to understand what additional question someone would be trying to answer when asking “is this person doing frame control?” aside from noticing which of the features of “frame control” they satisfy.

For one, “frame control” may draw a boundary around an empirical cluster in thingspace, in which case the question they would get evidence for is “are they also doing the other things in this cluster”?

But I think the claim the post makes goes further than that. It’s not just that people who do frame control thing #5 are more likely to do frame control thing #13, it’s also that both may be in service of common goals. The post doesn’t make it explicit what those goals are, but that doesn’t mean they don’t exist. (And they can exist even in cases where frame control is applied subconsciously.)

• I definitely don’t think any human-made song is perfect. (Do you claim a superintelligent AI would be incapable of improving it? If yes, I question your models; if not, in what sense is it perfect?) But if you’re asking me for music that I think is extremely good, leaving it up to me to decide what that means, the first thing that comes to mind is Program Music III. I would recommend the best song but there only is one song, so I’ll recommend the first 12 minutes.

• I think the bet is a bad idea if you think in terms of Many Worlds. Say 55% of all worlds end by 2030. Then, even assuming that value-of-$-in-2017 = value-of-$-in-2030, Eliezer personally benefited from the bet. However, the epistemic result is Bryan getting prestige points in 45% of worlds, Eliezer getting prestige points in 0% of worlds.

The other problem with the bet is that, if we adjust for inflation and returns of money, the bet is positive EV for Eliezer even given P(world-ends-by-2030) << .

• 14 years later, I notice that Eliezer missed the other reason why evolution didn’t design organisms that have fitness maximization as an explicit motivation. It’s not just that it can’t plan well enough to get there, it’s also that such a motivation would have a disadvantage compared to a set of heuristics: higher computational cost. A hypothetical mind only concerned with fitness maximization would probably have to rediscover a bunch of heuristics like “excessive pain is bad” to survive practice. (At that point, it would indeed have an advantage in that it could avoid many of the failure modes of heuristics.)

• Survey on model updates from reading this post. Figuring out to what extent this post has led people to update may inform whether future discussions are valuable.

Results: (just posting them here, doesn’t really need its own post)

The question was to rate agreement on the 1=Paul to 9=Eliezer axis before and after reading this post.

Data points: 35

Mean:

Median:

Raw Data

Agreement more on need for actions than on probabilities. Would be better to first present points of agreement (that it is at least possible for non(dangerously)-general AI to change situation).

the post was incredibly confusing to me and so I haven’t really updated at all because I don’t feel like I can crisply articulate yudkowsky’s model or his differences with christiano

• Relevant post given the situation with Leverage.

• (Rereading the entire Sequnces right now)

I suspect that (a) I haven’t done very well on the ‘spend 5 minutes by the clock searching for an alternative’ advice in the last couple of years (even though I think I have internalized the habit to look for a third way at least briefly), and (b) doing so probably would have helped me avoid some severe mistakes. This may be related to EY’s overarching comment that the sequences focus too much on epistemic and too little on instrumental rationality. It’s a lot easier to understand and accept this, and even to apply it a little, than to actually adopt a habit of brainstorming for 5 minutes by the clock.

• There is no way that the posterior odds are more than toward the librarian. I would be very surprised if it were . Spelled out, The point of the example seems to be “people forget the base rate, and once you know the base rate, it’s obvious that it’s more significant than the update based on shyness”. I don’t need a source for this; it doesn’t matter whether the update based on shyness is or or something in between; any of that is dominated by the base rate.

• One of my updates from reading this is that Rapid vs. Gradual takeoff seems like an even more important variable for many people’s model than I had assumed. Making this debate less one-sided might thus be super valuable even if writing up arguments is costly.