J. Dmitri Gallow comments on Instrumental Convergence? [Draft]

J. Dmitri Gallow 20 Jul 2023 1:24 UTC
LW: 8 AF: 4
0
AF
Thanks for the read and for the response.

>None of your models even include actions that are analogous to the convergent actions on that list.

I’m not entirely sure what you mean by “model”, but from your use in the penultimate paragraph, I believe you’re talking about a particular decision scenario Sia could find herself in. If so, then my goal wasn’t to prove anything about a particular model, but rather to prove things about every model.

>The non-sequential theoretical model is irrelevant to instrumental convergence, because instrumental convergence is about putting yourself in a better position to pursue your goals later on.

Sure. I started with the easy cases to get the main ideas out. Section 4 then showed how those initial results extend to the case of sequential decision making.

>Section 4 deals with sequential decisions, but for some reason mainly gets distracted by a Newcomb-like problem, which seems irrelevant to instrumental convergence. I don’t see why you didn’t just remove Newcomb-like situations from the model?

I used the Newcomb problem to explain the distinction between sophisticated and resolute choice. I wasn’t assuming that Sia was going to be facing a Newcomb problem. I just wanted to help the reader understand the distinction. The distinction is important, because it makes a difference to how Sia will choose. If she’s a resolute chooser, then sequential decisions reduce to a single non-sequential decisions. She just chooses a contingency plan at the start, and then sticks to that contingency plan. Whereas if she’s a sophisticated chooser, then she’ll make a series of non-sequential decisions. In both cases, it’s important to understand how she’ll choose in non-sequential decisions, which is why I started off thinking about that in section 3.

>It seems clear to me that for the vast majority of the random utility functions, it’s very valuable to have more control over the future world state. So most sampled agents will take the instrumentally convergent actions early in the game and use the additional power later on.

I am not at all confident about what would happen with randomly sampled desires in this decision. But I am confident about what I’ve proven, namely: if she’s a resolute chooser with randomly sampled desires, then for any two contingency plans, Sia is just as likely to prefer the first to the second as she is to prefer the second to the first.

When it comes to the ‘power-seeking’ contingency plans, there are two competing biases. On the one hand, Sia is somewhat biased towards them for the simple reason that there are more of them. If some early action affords more choices later on, then there are going to be more contingency plans which make that early choice. On the other hand, Sia is somewhat biased against them, since they are somewhat less predictable—they leave more up to chance.

I’ve no idea which of these biases will win out in your particular decision. It strikes me as a pretty difficult question.
- Jeremy Gillen 23 Jul 2023 9:51 UTC
  3 points
  0
  Parent
  Section 4 then showed how those initial results extend to the case of sequential decision making.
  [...]
  If she’s a resolute chooser, then sequential decisions reduce to a single non-sequential decisions.
  Ah thanks, this clears up most of my confusion, I had misunderstood the intended argument here. I think I can explain my point better now:
  I claim that proposition 3, when extended to sequential decisions with a resolute decision theory, shouldn’t be interpreted the way you interpret it. The meaning changes when you make A and B into sequences of actions.
  Let’s say action A is a list of 1000000 particular actions (e.g. 1000000 small-edits) and B is a list of 1000000 particular actions (e.g. 1 improve-technology, then 999999 amplified-edits).^[1]
  Proposition 3 says that A is equally likely to be chosen as B (for randomly sampled desires). This is correct. Intuitively this is because A and B are achieving particular outcomes and desires are equally likely to favor “opposite” outcomes.
  However this isn’t the question we care about. We want to know whether action-sequences that contain “improve-technology” are more likely to be optimal than action-sequences that don’t contain “improve-technology”, given a random desire function. This is a very different question to the one proposition 3 gives us an answer to.
  Almost all optimal action-sequences could contain “improve-technology” at the beginning, while any two particular action sequences are equally likely to be preferred to the other on average across desires. These two facts don’t contradict each other. The first fact is true in many environments (e.g. the one I described^[2]) and this is what we mean by instrumental convergence. The second fact is unrelated to instrumental convergence.
  I think the error might be coming from this definition of instrumental convergence:
  could we nonetheless say that she’s got a better than $1 / n$ probability of choosing $A$ from a menu of $n$ acts?
  When $A$ is a sequence of actions, this definition makes less sense. It’d be better to define it as something like “from a menu of $n$ initial actions, she has a better than $1 / n$ probability of choosing a particular initial action $A_{1}$ ”.
  I’m not entirely sure what you mean by “model”, but from your use in the penultimate paragraph, I believe you’re talking about a particular decision scenario Sia could find herself in.
  Yep, I was using “model” to mean “a simplified representation of a complex real world scenario”.
  1. ^
    For simplicity, we can make this scenario a deterministic known environment, and make sure the number of actions available doesn’t change if “improve-technology” is chosen as an action. This way neither of your biases apply.
  2. ^
    E.g. we could define a “small-edit” as $\pm 0.01$ to any location in the state vector. Then an “amplified-edit” as $\pm 0.1$ to any location. This preserves the number of actions, and makes the advantage of “amplified-edit” clear. I can go into more detail if you like, this does depend a little on how we set up the distribution over desires.