Daniel Kokotajlo comments on Defusing AGI Danger

Daniel Kokotajlo 26 Dec 2020 18:02 UTC
LW: 7 AF: 3
0
AF
I think your conclusion section is really important, because it prevents a possible misinterpretation of your post.
One can imagine a spectrum with “disaster by default” on one side and “alignment by default” on the other. To the extent that one is closer to “disaster by default”, trying to defuse specific arguments for AGI danger seems like it’s missing the forest for the trees, analogous to trying to improve computer security by not allowing users to use “password” as their password. To the extent that one is closer to “alignment by default”, trying to defuse specific arguments seems quite useful, closer to conducting a fault analysis on a hypothetical airplane crash.
Since I’m much closer to the “disaster by default” end of the spectrum, I think most of our effort should focus on the safety stories approach rather than the defusing dangers approach. And I think you haven’t presented any arguments for safety by default; you’ve just explained what we should do if we believe in safety by default. So it would be a misinterpretation of your post to think that it argues for the defusing disaster strategy to take priority over the safety stories strategy.
Instead (and this is how I interpret your post) both strategies should be pursued no matter where on the spectrum you are, but to different extents. E.g. if you are in the middle, you split effort 50-50 between strategies, and if you are towards the alignment by default edge, you split effort 80-20, etc. This seems quite plausible to me.
- Mark Xu 26 Dec 2020 18:06 UTC
  LW: 5 AF: 2
  0
  AF Parent
  I absolutely agree that I’m not arguing for “safety by default”.
  
  I don’t quite agree that you should split effort between strategies, i.e. it seems likely that if you think 80% disaster by default, you should dedicate 100% of your efforts to that world.
  - Daniel Kokotajlo 26 Dec 2020 18:25 UTC
    LW: 6 AF: 3
    0
    AF Parent
    OK, interesting. Well, here’s my argument for effort-splitting then: There are probably diminishing returns to pursuing each strategy. In research in general, ideas and questions tend to cross-pollinate, etc. And if you are 20% confident that research project X is the most important, and 80% that research project Y is most important, and they are both on a similar topic, this seems like a classic case where you should do both (but with more effort towards Y).
    This is more of an intuition than an argument, I guess. But what do you think?
    - Mark Xu 26 Dec 2020 18:53 UTC
      LW: 4 AF: 2
      0
      AF Parent
      My opposite intuition is suggested by the fact that if you’re trying to guess correctly a series of random digits with 80% “1” and 20% “0″, then you should always guess “1”.
      
      I don’t quite know how to model cross-pollination and diminishing sort of returns. I think working on both for the information value is likely going to be very good. It seems hard to imagine a scenario where you’re robustly confident that one project is 80% better taking diminishing returns into account without being able to create a 3rd project with the best features of both, but if you’re in that scenario I think just spending all your efforts on the 80% project seems correct.
      
      One example is deciding between 2 fundamentally different products your startup could be making. We also supposed that creating an MVP of either product that would provide information would take a really long time. In this situation, if you suspect one of them is 60% likely to be better than the other it would be less useful to spend your time in a ⁶⁰⁄₄₀ split rather than building the MVP of the one likely to be better and reevaluating after getting more information.
      
      The version of your claim that I agree with is “In your current epistemic state, you should spend all your time pursuing the 80% project, but the 80% probably isn’t that robust, working on a project has diminishing returns, and other projects will give more information value, globally the amount of time you expect to spend on the 80% project is about 80%.”
      - Daniel Kokotajlo 26 Dec 2020 20:59 UTC
        LW: 5 AF: 3
        0
        AF Parent
        Here’s a way to model diminishing returns: The first hour of research on strategy X produces as much value as the next two hours, which produces as much value as the next four hours, etc. Value = log_2(hours). If this is true, then you should split your hours such that log_2(hourstowards80project)*0.8 + log_2(hourstoward20project)*0.2 is maximized, which I think means that you should distribute your hours across projects proportional to their probability… https://www.wolframalpha.com/input/?i=argmax%28log_2%28X%29*0.8+%2B+log_2%281-X%29*0.2%29 (I don’t know much math so I’m not confident I’m doing this right)
        Value of information I hadn’t even considered, but maybe we can bundle it up with diminishing returns and say it’s part of the reason returns diminish.
        Daniel Kokotajlo 27 Dec 2020 13:26 UTC
        LW: 2 AF: 1
        0
        AF Parent
        Huh. It seems like there is some general theorem here that might be worth writing up. If we combine the heavy-tailed hypothesis with this theorem, maybe we get some sort of nontrivial and useful general heuristic: The optimal allocation of time/money/etc. is proportional to the probability that a project is the most valuable thing you can be doing. That is, take the options you are considering, and evaluate the probability that each option is the best of the bunch. Then, distribute your resources according to that probability. This will be optimal or approximately optimal so long as (1) returns to resources diminish logarithmically for each project at about the same rate, and (2) the best project is likely to be several times better than the next-best and so on (heavy-tailed distribution of project goodness). I think 2 is usually true for altrustic projects, and insofar as 1 is false, maybe it doesn’t matter because we are ignorant of which project diminishes faster, or maybe we do know which project diminishes faster and we can adjust accordingly (it should just be another multiplier to the ratio when dividing up resources, I think). I expect someone has said all this before somewhere...