Scott Garrabrant comments on The Plan

Scott Garrabrant 11 Dec 2021 22:09 UTC
LW: 56 AF: 22
0
AF
I want to disagree about MIRI.

Mostly, I think that MIRI (or at least a significant subset of MIRI) has always been primarily directed at agenty systems in general.

I want to separate agent foundations at MIRI into three eras. The Eliezer Era (2001-2013), the Benya Era (2014-2016), and the Scott Era(2017-).
The transitions between eras had an almost complete overhaul of the people involved. In spite of this, I believe that they have roughly all been directed at the same thing, and that John is directed at the same thing.
The proposed mechanism behind the similarity is not transfer, but instead because agency in general is a convergent/natural topic.
I think throughout time, there has always been a bias in the pipeline from ideas to papers towards being more about AI. I think this bias has gotten smaller over time, as the agent foundations research program both started having stable funding, and started carrying less and less of the weight of all of AI alignment on its back. (Before going through editing with Rob, I believe Embedded Agency had no mention of AI at all.)
I believe that John thinks that the Embedded Agency document is especially close to his agenda, so I will start with that. (I also think that both John and I currently have more focus on abstraction than what is in the Embedded Agency document).
Embedded Agency, more so than anything else I have done was generated using an IRL shaped research methodology. I started by taking the stuff that MIRI has already been working on, mostly the artifacts of the Benya Era, and trying to communicate the central justification that would cause one to be interested in these topics. I think that I did not invent a pattern, but instead described a preexisting pattern that originally generated the thoughts.
This is consistent with having the pattern be about agency in general, and so I could find the pattern in ideas that were generated based on agency in AI, but I think this is not the case. I think the use of proof based systems is demonstrating an extreme disregard for the substrate that the agency is made of. I claim that the reason that there was a historic focus on proof-based agents, is because it is a system that we could actually say stuff about. The fact that real life agents looked very different of the surface from proof based agents was a shortfall that most people would use to completely reject the system, but MIRI would work in it because what they really cared about was agency in general, and having another system that is easy to say things about that could be used to triangulate agency in general. If MIRI was directed at a specific type of agency, they would have rejected the proof based systems as being too different.
I think that MIRI is often misrepresented as believing in GOFAI because people look at the proof based systems and think that MIRI would only study those if they thought that is what AI might look at. I think in fact, the reason for the proof based systems is because at the time, this was the most fruitful models we had, and we were just very willing to use any lens that worked when trying to look at something very very general.
(One counterpoint here, is maybe MIRI didn’t care about the substrate the agency was running on, but did have a bias towards singleton-like agency, rather than very distributed systems, I think this is slightly true. Today, I think that you need to understand the distributed systems, because realistic singleton-like agents follow many of the same rules, but it is possible that early MIRI did not believe this as much)
Most of the above was generated by looking at the Benya Era, and trying to justify that it was directed at agency in general at least/almost as much as the Scott Era, which seems like the hardest of three for me.
For the Scott Era, I have introspection. I sometimes stop thinking in general, and focus on AI. This is usually a bad idea, and doesn’t generate as much fruit, and it is usually not what I do.

For the Eliezer Era, just look at the sequences.

I just looked up and reread, and tried to steel man what you originally wrote. My best steel man is that you are saying that MIRI is trying the develop a prescriptive understanding of agency, and you are trying the develop a descriptive understanding of agency. There might be something to this, but it is really complicated. One way to define agency is as the pipeline from the prescriptive to the descriptive, so I am not sure that prescriptive and descriptive agency makes sense as a distinction.
As for the research methodology, I think that we all have pretty different research methodologies. I do not think Benya and Eliezer and I have especially more in common with each other than we do with John, but I might be wrong here. I also don’t think Sam and Abram and Tsvi and I have especially more in common in terms of research methodologies, except in so far as we have been practicing working together.

In fact, the thing that might be going on here is that the distinctions in topics is coming from differences in research skills. Maybe proof based systems are the most fruitful model if you are a Benya, but not if you are a Scott or a John. But this is about what is easiest for you to think about, not about a difference in the shared convergent subgoal of understanding agency in general.
What links here?
- Ruby's comment on The Plan by johnswentworth (19 Dec 2021 4:20 UTC; 14 points)
- johnswentworth 12 Dec 2021 20:03 UTC
  LW: 18 AF: 10
  0
  AF Parent
  I generally agree with most of this, but I think it misses the main claim I wanted to make. I totally agree that all three eras of MIRI’s agent foundations research had some vision of the general theory of agency behind them, driving things. My point of disagreement is that, for most of MIRI’s history, elucidating that general theory has not been the primary optimization objective.
  Let’s go through some examples.
  The Sequences: we can definitely see Eliezer’s understanding of the general theory of agency in many places, especially when talking about Bayes and utility. (Engines of Cognition is a central example.) But most of the sequences talk about things like failure modes of human cognition, how to actually change your mind, social failure modes of human cognition, etc. It sure looks like the primary optimization objective is about better human thinking, plus some general philosophical foundations, not the elucidation of the general theory of agency.
  Tiling agents and proof-based decision theories: I’m on board with the use of proof-based setups to make minimal assumptions about “the substrate that the agency is made of”. That’s an entirely reasonable choice, and it does look like that choice was driven (in large part) by a desire for the theory to apply quite generally. But these models don’t look like they were ever intended as general models of agency (I doubt they would apply nicely to e-coli); in your words, they provided “another system that is easy to say things about that could be used to triangulate agency in general”. That’s not necessarily a bad step on the road to general theory, but the general theory itself was not the main thing those models were doing. (Personally, I think we already have enough points to triangulate from for the time being. I think if someone were just directly, explicitly optimizing for a general theory of agency they’d probably come to that same conclusion. On the other hand, I could imagine someone very focused on self-reference barriers in particular might end up hunting for more data points, and it’s plausible that someone directly optimizing for a general theory of agency would end up focused mostly on self-reference.)
  Grain of truth: similar to tiling agents and proof-based decision theories, this sounds like “another system that is easy to say things about that could be used to triangulate agency in general”. It does not sound like a part of the general theory of agency in its own right.
  Logical induction: here we see something which probably would apply to an e-coli; it does sound like a part of a general theory of agency. (For the peanut gallery: I’m talking about LI criterion here, not the particular algorithm.) On the other hand, I wouldn’t expect it to say much of interest about an e-coli beyond what we already know from older coherence theorems. It’s still mainly of interest in problems of reflection. And I totally buy that reflection is an important bottleneck to the general theory of agency, but I wouldn’t expect to see such a singular focus on that one bottleneck if someone were directly optimizing for a general theory of agency as their primary objective.
  Embedded agents: in your own words, you “started by taking the stuff that MIRI has already been working on, mostly the artifacts of the Benya Era, and trying to communicate the central justification that would cause one to be interested in these topics”. You did not start by taking all the different agenty systems you could think of, and trying to communicate the central concept that would cause one to be interested in those systems. I do think embedded agency came closer than any other example on this list to tackling the general theory of agency, but it still wasn’t directly optimizing for that as the primary objective.
  Going down that list (and looking at your more recent work), it definitely looks like research has been more and more directly pointed at the general theory of agency over time. But it also looks like that was not the primary optimization objective over most of MIRI’s history, which is why I don’t think slow progress on agent foundations to date provides strong evidence that the field is very difficult. Conversely, I’ve seen firsthand how tractable things are when I do optimize directly for a general theory of agency, and based on that experience I expect fairly fast progress.
  (Addendum for the peanut gallery: I don’t mean to bash any of this work; every single thing on the list was at least great work, and a lot of it was downright brilliant. There’s a reason I said MIRI is the best org at this kind of work. My argument is just that it doesn’t provide especially strong evidence that agent foundations are hard, because the work wasn’t directly optimizing for the general theory of agency as its primary objective.)
  What links here?
  - johnswentworth's comment on The Plan by johnswentworth (12 Dec 2021 20:19 UTC; 5 points)
  - Scott Garrabrant 12 Dec 2021 23:22 UTC
    LW: 11 AF: 8
    0
    AF Parent
    Hmm, yeah, we might disagree about how much reflection(self-reference) is a central part of agency in general.
    
    It seems plausible that it is important to distinguish between the e-coli and the human along a reflection axis (or even more so, distinguish between evolution and a human). Then maybe you are more focused on the general class of agents, and MIRI is more focused on the more specific class of “reflective agents.”
    Then, there is the question of whether reflection is going to be a central part of the path to (F/D)OOM.
    Does this seem right to you?
    - Scott Garrabrant 12 Dec 2021 23:39 UTC
      LW: 11 AF: 8
      0
      AF Parent
      To operationalize, I claim that MIRI has been directed at a close enough target to yours that you probably should update on MIRI’s lack of progress at least as much as you would if MIRI was doing the same thing as you, but for half as long.
      - Scott Garrabrant 13 Dec 2021 0:25 UTC
        LW: 12 AF: 8
        0
        AF Parent
        Which isn’t *that* large an update. The average number of agent foundations researchers (That are public facing enough that you can update on their lack of progress) at MIRI over the last decade is like 4.
        Figuring out how to factor in researcher quality is hard, but it seems plausible to me that the amount of quality adjusted attention directed at your subgoal over the next decade is significantly larger than the amount of attention directed at your subgoal over the last decade. (Which would not all come from you. I do think that Agent Foundations today is non-trivially closer to John today that Agent Foundations 5 years ago is to John today.)
        It seems accurate to me to say that Agent Foundations in 2014 was more focused on reflection, which shifted towards embeddedness, and then shifted towards abstraction, and that these things all flow together in my head, and so Scott thinking about abstraction will have more reflection mixed in than John thinking about abstraction. (Indeed, I think progress on abstraction would have huge consequences on how we think about reflection.)
        
        In case it is not obvious to people reading, I endorse John’s research program. (Which can maybe be inferred by the fact that I am arguing that it is similar to my own). I think we disagree about what is the most likely path after becoming less confused about agency, but that part of both our plans is yet to be written, and I think the subgoal is enough of a simple concept that I don’t think disagreements about what to do next to have a strong impact on how to do the first step.
        What links here?
        johnswentworth's comment on The Plan − 2022 Update by johnswentworth (15 Dec 2022 17:43 UTC; 2 points)
        johnswentworth 13 Dec 2021 17:54 UTC
        LW: 8 AF: 5
        0
        AF Parent
        This all sounds right.
        In particular, for folks reading, I symmetrically agree with this part:
        In case it is not obvious to people reading, I endorse John’s research program. (Which can maybe be inferred by the fact that I am arguing that it is similar to my own). I think we disagree about what is the most likely path after becoming less confused about agency, but that part of both our plans is yet to be written, and I think the subgoal is enough of a simple concept that I don’t think disagreements about what to do next to have a strong impact on how to do the first step.
        … i.e. I endorse Scott’s research program, mine is indeed similar, I wouldn’t be the least bit surprised if we disagree about what comes next but we’re pretty aligned on what to do now.
        Also, I realize now that I didn’t emphasize it in the OP, but a large chunk of my “50/50 chance of success” comes from other peoples’ work playing a central role, and the agent foundations team at MIRI is obviously at the top of the list of people whose work is likely to fit that bill. (There’s also the whole topic of producing more such people, which I didn’t talk about in the OP at all, but I’m tentatively optimistic on that front too.)
    - johnswentworth 13 Dec 2021 17:40 UTC
      LW: 7 AF: 5
      0
      AF Parent
      That does seem right.
      I do expect reflection to be a pretty central part of the path to FOOM, but I expect it to be way easier to analyze once the non-reflective foundations of agency are sorted out. There are good reasons to expect otherwise on an outside view—i.e. all the various impossibility results in logic and computing. On the other hand, my inside view says it will make more sense once we understand e.g. how abstraction produces maps smaller than the territory while still allowing robust reasoning, how counterfactuals naturally pop out of such abstractions, how that all leads to something conceptually like a Cartesian boundary, the relationship between abstract “agent” and the physical parts which comprise the agent, etc.
      If I imagine what my work would look like if I started out expecting reflection to be the taut constraint, then it does seem like I’d follow a path a lot more like MIRI’s. So yeah, this fits.
      - adamShimi 13 Dec 2021 18:03 UTC
        LW: 5 AF: 3
        0
        AF Parent
        If I imagine what my work would look like if I started out expecting reflection to be the taut constraint, then it does seem like I’d follow a path a lot more like MIRI’s. So yeah, this fits.
        One thing I’m still not clear about in this thread is whether you (John) would feel that progress has been made for the theory of agency if all the problems on which MIRI were instantaneously solved. Because there’s a difference between saying “this is the obvious first step if you believe reflection is the taut constraint” and “solving this problem would help significantly even if reflection wan’t the taut constraint”.
        johnswentworth 13 Dec 2021 19:56 UTC
        LW: 4 AF: 2
        0
        AF Parent
        I expect that progress on the general theory of agency is a necessary component of solving all the problems on which MIRI has worked. So, conditional on those problems being instantly solved, I’d expect that a lot of general theory of agency came along with it. But if a “solution” to something like e.g. the Tiling Problem didn’t come with a bunch of progress on more foundational general theory of agency, then I’d be very suspicious of that supposed solution, and I’d expect lots of problems to crop up when we try to apply the solution in practice.
        (And this is not symmetric: I would not necessarily expect such problems in practice for some more foundational piece of general agency theory which did not already have a solution to the Tiling Problem built into it. Roughly speaking, I expect we can understand e-coli agency without fully understanding human agency, but not vice-versa.)
        Scott Garrabrant 13 Dec 2021 20:33 UTC
        LW: 5 AF: 3
        0
        AF Parent
        I agree with this asymmetry.
        One thing I am confused about is whether to think of the e-coli as qualitatively different from the human. The e-coli is taking actions that can be well modeled by an optimization process searching for actions that would be good if this optimization process output them, which has some reflection in it.
        It feels like it can behaviorally be well modeled this way, but is mechanistically not shaped like this, I feel like the mechanistic fact is more important, but I feel like we are much closer to having behavioral definitions of agency than mechanistic ones.
        johnswentworth 13 Dec 2021 20:56 UTC
        LW: 5 AF: 3
        0
        AF Parent
        I would say the e-coli’s fitness function has some kind of reflection baked into it, as does a human’s fitness function. The qualitative difference between the two is that a human’s own world model also has an explicit self-model in it, which is separate from the reflection baked into a human’s fitness function.
        After that, I’d say that deriving the (probable) mechanistic properties from the fitness functions is the name of the game.
        … so yeah, I’m on basically the same page as you here.
- johnswentworth 12 Dec 2021 20:19 UTC
  LW: 5 AF: 3
  0
  AF Parent
  Main response is in another comment; this is a tangential comment about prescriptive vs descriptive viewpoints on agency.
  I think viewing agency as “the pipeline from the prescriptive to the descriptive” systematically misses a lot of key pieces. One central example of this: any properties of (inner/mesa) agents which stem from broad optima, rather than merely optima. (For instance, I expect that modularity of trained/evolved systems mostly comes from broad optima.) Such properties are not prescriptive principles; a narrow optimum is still an optimum. Yet we should expect such properties to apply to agenty systems in practice, including humans, other organisms, and trained ML systems.
  The Kelly criterion is another good example: Abram has argued that it’s not a prescriptive principle, but it is still a very strong descriptive principle for agents in suitable environments.
  More importantly, I think starting from prescriptive principles makes it much easier to miss a bunch of the key foundational questions—for instance, things like “what is an optimizer?” or “what are goals?”. Questions like these need some kind of answer in order for many prescriptive principles to make sense in the first place.
  Also, as far as I can tell to date, there is an asymmetry: a viewpoint starting from prescriptive principles misses key properties, but I have not seen any sign of key principles which would be missed starting from a descriptive viewpoint. (I know of philosophical arguments to the contrary, e.g. this, but I do not expect such things to cash out into any significant technical problem for agency/alignment, any more than I expect arguments about solipsism to cash out into any significant technical problem.)
- ADifferentAnonymous 12 Dec 2021 3:26 UTC
  4 points
  0
  Parent
  As a long-time LW mostly-lurker, I can confirm I’ve always had the impression MIRI’s proof-based stuff was supposed to be a spherical-cow model of agency that would lead to understanding of the messy real thing.
  
  What I think John might be getting at is that (my outsider’s impression of) MIRI has been more focused on “how would I build an agent” as a lens for understanding agency in general—e.g. answering questions about the agency of e-coli is not the type of work I think of. Which maybe maps to ‘prescriptive’ vs. ‘descriptive’?