Richard_Ngo comments on Formalizing Objections against Surrogate Goals

Richard_Ngo 3 Sep 2021 13:42 UTC
LW: 8 AF: 6
AF
Interesting report :) One quibble:
For one, our AIs can only use “things like SPI” if we actually formalize the approach
I don’t see why this is the case. If it’s possible for humans to start using things like SPI without a formalisation, why couldn’t AIs too? (I agree it’s more likely that we can get them to do so if we formalise it, though.)
- VojtaKovarik 4 Sep 2021 20:54 UTC
  LW: 1 AF: 1
  AF Parent
  Thanks for pointing this out :-). Indeed, my original formulation is false; I agree with the “more likely to work if we formalise it” formulation.
  - Caspar Oesterheld 7 Sep 2021 20:12 UTC
    LW: 6 AF: 5
    AF Parent
    Not very important, but: Despite having spent a lot of time on formalizing SPIs, I have some sympathy for a view like the following:
    
    > Yeah, surrogate goals / SPIs are great. But if we want AI to implement them, we should mainly work on solving foundational issues in decision and game theory with an aim toward AI. If we do this, then AI will implement SPIs (or something even better) regardless of how well we understand them. And if we don’t solve these issues, then it’s hopeless to add SPIs manually. Furthermore, believing that surrogate goals / SPIs work (or, rather, make a big difference for bargaining outcomes) shouldn’t change our behavior much (for the reasons discussed in Vojta’s post).
    
    On this view, it doesn’t help substantially to understand / analyze SPIs formally.
    
    But I think there are sufficiently many gaps in this argument to make the analysis worthwhile. For example, I think it’s plausible that the effective use of SPIs hinges on subtle aspects of the design of an agent that we might not think much about if we don’t understand SPIs sufficiently well.
    - Ofer 9 Sep 2021 18:03 UTC
      LW: 2 AF: 2
      AF Parent
      Regarding the following part of the view that you commented on:
      
      But if we want AI to implement them, we should mainly work on solving foundational issues in decision and game theory with an aim toward AI.
      
      Just wanted to add: It may be important to consider potential downside risks of such work. It may be important to be vigilant when working on certain topics in game theory and e.g. make certain binding commitments before investigating certain issues, because otherwise one might lose a commitment race in logical time. (I think this is a special case of a more general argument made in Multiverse-wide Cooperation via Correlated Decision Making about how it may be important to make certain commitments before discovering certain crucial considerations.)