J. Dmitri Gallow

Karma: 68

Instrumental Convergence? [Draft]

J. Dmitri Gallow14 Jun 2023 20:21 UTC

47 points

20 comments33 min readLW link

J. Dmitri Gallow 15 Jun 2023 8:40 UTC
3 points
0
in reply to: Evan R. Murphy’s comment on: Instrumental Convergence? [Draft]
There are infinitely many desires like that, in fact (that’s what proposition 2 shows).

More generally, take any self-preservation contingency plan, A, and any other contingency plan, B. If we start out uncertain about what Sia wants, then we should think her desires are just as likely to make A more rational than B as they are to make B more rational than A. (That’s what proposition 3 shows.)

That’s rough and subject to a bunch of caveats, of course. I try to go through all of those caveats carefully in the draft.

J. Dmitri Gallow 16 Jun 2023 1:48 UTC
15 points
13
in reply to: rvnnt’s comment on: Instrumental Convergence? [Draft]
Wouldn’t this imply a bias towards eliminating other agents? (Since that would make the world more predictable, and thereby leave less up to chance?)
A few things to note. Firstly, when I say that there’s a ‘bias’ towards a certain kind of choice, I just mean that the probability that a superintelligent agent with randomly sampled desires (Sia) would make that choice is greater than 1/N, where N is the number of choices available. So, just to emphasize the scale of the effect: even if you were right about that inference, you should still assign very low probability to Sia taking steps to eliminate other agents.

Secondly, when I say that a choice “leaves less up to chance”, I just mean that the sum total of history is more predictable, given that choice, than the sum total of history is predictable, given other choices. (I mention this just because you didn’t read the post, and I want to make sure we’re not talking past each other.)

Thirdly, I would caution against the inference: without humans, things are more predictable; therefore, undertaking to eliminate other agents leaves less up to chance. Even if things are predictable after humans are eliminated, and even if Sia can cook up a foolproof contingency plan for eliminating all humans, that doesn’t mean that that contingency plan leaves less up to chance. Insofar as the contingency plan is sensitive to the human response at various stages, and insofar as that human response is unpredictable (or less predictable than humans are when you don’t try to kill them all), this bias wouldn’t lend any additional probability to Sia choosing that contingency plan.

Fourthly, this bias interacts with the others. Futures without humanity might be futures which involve fewer choices—other deliberative agents tend to force more decisions. So contingency plans which involve human extinction may involve comparatively fewer choicepoints than contingency plans which keep humans around. Insofar as Sia is biased towards contingency plans with more choicepoints, that’s a reason to think she’s biased against eliminating other agents. I don’t have any sense of how these biases interact, or which one is going to be larger in real-world decisions.
Wouldn’t this strongly imply biases towards both self-preservation and resource acquisition?
In some decisions, it may. But I think here, too, we need to tread with caution. In many decisions, this bias makes it somewhat more likely that Sia will pursue self-destruction. To quote myself:
Sia is biased towards choices which allow for more choices—but this isn’t the same thing as being biased towards choices which guarantee more choices. Consider a resolute Sia who is equally likely to choose any contingency plan, and consider the following sequential decision. At stage 1, Sia can either take a ‘safe’ option which will certainly keep her alive or she can play Russian roulette, which has a 1-in-6 probability of killing her. If she takes the ‘safe’ option, the game ends. If she plays Russian roulette and survives, then she’ll once again be given a choice to either take a ‘safe’ option of definitely staying alive or else play Russian roulette. And so on. Whenever she survives a game of Russian roulette, she’s again given the same choice. All else equal, if her desires are sampled normally, a resolute Sia will be much more likely to play Russian roulette at stage 1 than she will be to take the ‘safe’ option.
See the post to understand what I mean by “resolute”—and note that the qualitative effect doesn’t depend upon whether Sia is a resolute chooser.

J. Dmitri Gallow 16 Jun 2023 2:02 UTC
2 points
0
in reply to: Evan R. Murphy’s comment on: Instrumental Convergence? [Draft]
A quick prefatory note on how I’m thinking about ‘goals’ (I don’t think it’s relevant, but I’m not sure): as I’m modelling things, Sia’s desires/goals are given by a function from ways the world could be (colloquially, ‘worlds’) to real numbers, $D$ , with the interpretation that $D (W)$ is how well satisfied Sia’s desires are if $W$ turns out to be the way the world actually is. By ‘the world’, I mean to include all of history, from the beginning to the end of time, and I mean to encompass every region of space. I assume that this function can be well-defined even for worlds in which Sia never existed or dies quickly. Humans can want to never have been born, and they can want to die. So I’m assuming that Sia can also have those kinds of desires, in principle. So her goal can be achieved even if she’s not around.

When I talk about ‘goal preservation’, I was talking about Sia not wanting to change her desires. I think you’re right that that’s different from Sia wanting to retain her desires. If she dies, then she hasn’t retained her desires, but neither has she changed them. The effect I found was that Sia is somewhat more likely to not want her desires changed.

J. Dmitri Gallow 16 Jun 2023 11:18 UTC
4 points
0
in reply to: rvnnt’s comment on: Instrumental Convergence? [Draft]
There’s nothing unusual about my assumptions regarding instrumental rationality. It’s just standard expected utility theory.

The place I see to object is with my way of spreading probabilities over Sia’s desires. But if you object to that, I want to hear more about which probably distribution I should be using to understand the claim that Sia’s desires are likely to rationalise power-seeking, resource acquisition, and so on. I reached for the most natural way of distributing probabilities I could come up with—I was trying to be charitable to the thesis, & interpreting it in light of the orthogonality thesis. But if that’s not the right way to distribute probability over potential desires, if it’s not the right way of understanding the thesis, then I’d like to hear something about what the right way of understanding it is.

J. Dmitri Gallow 20 Jul 2023 1:24 UTC
LW: 8 AF: 4
0
AF
in reply to: Jeremy Gillen’s comment on: Instrumental Convergence? [Draft]
Thanks for the read and for the response.

>None of your models even include actions that are analogous to the convergent actions on that list.

I’m not entirely sure what you mean by “model”, but from your use in the penultimate paragraph, I believe you’re talking about a particular decision scenario Sia could find herself in. If so, then my goal wasn’t to prove anything about a particular model, but rather to prove things about every model.

>The non-sequential theoretical model is irrelevant to instrumental convergence, because instrumental convergence is about putting yourself in a better position to pursue your goals later on.

Sure. I started with the easy cases to get the main ideas out. Section 4 then showed how those initial results extend to the case of sequential decision making.

>Section 4 deals with sequential decisions, but for some reason mainly gets distracted by a Newcomb-like problem, which seems irrelevant to instrumental convergence. I don’t see why you didn’t just remove Newcomb-like situations from the model?

I used the Newcomb problem to explain the distinction between sophisticated and resolute choice. I wasn’t assuming that Sia was going to be facing a Newcomb problem. I just wanted to help the reader understand the distinction. The distinction is important, because it makes a difference to how Sia will choose. If she’s a resolute chooser, then sequential decisions reduce to a single non-sequential decisions. She just chooses a contingency plan at the start, and then sticks to that contingency plan. Whereas if she’s a sophisticated chooser, then she’ll make a series of non-sequential decisions. In both cases, it’s important to understand how she’ll choose in non-sequential decisions, which is why I started off thinking about that in section 3.

>It seems clear to me that for the vast majority of the random utility functions, it’s very valuable to have more control over the future world state. So most sampled agents will take the instrumentally convergent actions early in the game and use the additional power later on.

I am not at all confident about what would happen with randomly sampled desires in this decision. But I am confident about what I’ve proven, namely: if she’s a resolute chooser with randomly sampled desires, then for any two contingency plans, Sia is just as likely to prefer the first to the second as she is to prefer the second to the first.

When it comes to the ‘power-seeking’ contingency plans, there are two competing biases. On the one hand, Sia is somewhat biased towards them for the simple reason that there are more of them. If some early action affords more choices later on, then there are going to be more contingency plans which make that early choice. On the other hand, Sia is somewhat biased against them, since they are somewhat less predictable—they leave more up to chance.

I’ve no idea which of these biases will win out in your particular decision. It strikes me as a pretty difficult question.

J. Dmitri Gallow

In­stru­men­tal Con­ver­gence? [Draft]

Instrumental Convergence? [Draft]