Daniel Kokotajlo comments on Preparing for “The Talk” with AI projects

Daniel Kokotajlo 18 Jun 2020 13:19 UTC
LW: 4 AF: 1
0
AF
Thanks for the thoughtful pushback! It was in anticipation of comments like this that I put hedging language in like “it think” and “perhaps.” My replies:

This seems a bit like writing the bottom line first?
Like, AI fears in our community have come about because of particular arguments. If those arguments don’t apply, I don’t see why one should strongly assume that AI is to be feared, outside of having written the bottom line first.
1. Past experience has shown that even when particular AI risk arguments don’t apply, often an AI design is still risky, we just haven’t thought of the reasons why yet. So we should make a pessimistic meta-induction and conclude that even if our standard arguments for risk don’t apply, the system might still be risky—we should think more about it.
2. I intended those two “perhaps...” statements to be things the person says, not necessarily things that are true. So yeah, maybe they *say* the standard arguments don’t apply. But maybe they are wrong. People are great at rationalizing, coming up with reasons to get to the conclusion they wanted. If the conclusion they want is “We finally did it and made a super powerful impressive AI, come on come on let’s take it for a spin!” then it’ll be easy to fool yourself into thinking your architecture is sufficiently different as to not be problematic, even when your architecture is just a special case of the architecture in the standard arguments.
Points 1 and 2 are each individually sufficient to vindicate my claims, I think.
It also seems kind of condescending to operate under the assumption that you know more about the AI system someone is creating than the person who’s creating it knows? You refer to their safety strategy as “amateur”, but isn’t there a chance that having created this system entitles them to a “professional” designation? A priori, I would expect that an outsider not knowing anything about the project at hand would be much more likely to qualify for the “amateur” designation.
3. I’m not operating under the assumption that I know more about the AI system someone is creating than the person who’s creating it knows. The fact that you said this dismays me, because it is such an obvious staw man. It makes me wonder if I touched a nerve somehow, or had the wrong tone or something, to raise your hackles.
4. Yes, I refer to their safety strategy as amateur. Yes, this is appropriate. AI safety is related to AI capabilities, but the two are distinct sub-fields, and someone who is great at one could be not so great at another. Someone who doesn’t know the AI safety literature, who does something to make their AI safe, probably deserves the title amateur. I don’t claim to be a non-amateur AI scientist, and whether I’m a non-amateur AI safety person is irrelevant because I’m not going to be one of the people in The Talk. I do claim that e.g. someone like Paul Christiano or Stuart Russell is a professional AI safety person, whereas most AI scientists are not.
This isn’t obvious to me. One possibility is that there will be some system which is safe if used carefully, and having a decent technological lead gives you plenty of room to use it carefully, but if you delay your development too much, competing teams will catch up and you’ll no longer have space to use it carefully. I think you have to learn more about the situation to know for sure whether a month of delay is a good thing.
5. I agree that this is a possibility. This is why I said “say it buys us a month;” I meant that to be an average of the various possibilities. In retrospect I was unclear; I should have clarified that It might not be a good idea to delay at all, for the reasons you mention. I agree we have to learn more about the situation; in retrospect I shouldn’t have said “I think it would be better for these conversations to end X way” (even though that is what I think is most likely) but rather found some way to express the more nuanced position.
6. I agree with everything you say about overconfidence, echo chambers, etc. except that I don’t think I was writing the bottom line first in this case. I was making a claim without arguing for it, but then I argued for it in the comments when you questioned it. It’s perfectly reasonable (indeed necessary) to have some unargued for claims in any particular finite piece of writing.
- John_Maxwell 19 Jun 2020 6:39 UTC
  LW: 2 AF: 1
  0
  AF Parent
  1. Past experience has shown that even when particular AI risk arguments don’t apply, often an AI design is still risky, we just haven’t thought of the reasons why yet. So we should make a pessimistic meta-induction and conclude that even if our standard arguments for risk don’t apply, the system might still be risky—we should think more about it.
  I’ve heard this sentiment before, but I’m not aware of a standard reference supporting this claim (let me know if there’s something I’m not remembering), and I haven’t been totally satisfied when I probe people on it in the past.
  I agree we should think a lot because so much is at stake, but sometimes the fact that so much is at stake means that it’s better to act quickly.
  People are great at rationalizing, coming up with reasons to get to the conclusion they wanted. If the conclusion they want is “We finally did it and made a super powerful impressive AI, come on come on let’s take it for a spin!” then it’ll be easy to fool yourself into thinking your architecture is sufficiently different as to not be problematic, even when your architecture is just a special case of the architecture in the standard arguments.
  Agreed, I just don’t want people to fall into the trap of rationalizing the opposite conclusion either.
  I’m not operating under the assumption that I know more about the AI system someone is creating than the person who’s creating it knows. The fact that you said this dismays me, because it is such an obvious staw man. It makes me wonder if I touched a nerve somehow, or had the wrong tone or something, to raise your hackles.
  It did. Part of me thought it was better not to comment, but then I figured the entire point of the post was how to do outreach to people we don’t agree with, so I decided it was better to express my frustration.
  5. I agree that this is a possibility. This is why I said “say it buys us a month;” I meant that to be an average of the various possibilities. In retrospect I was unclear; I should have clarified that It might not be a good idea to delay at all, for the reasons you mention. I agree we have to learn more about the situation; in retrospect I shouldn’t have said “I think it would be better for these conversations to end X way” (even though that is what I think is most likely) but rather found some way to express the more nuanced position.
  Thanks for clarifying.
  - Daniel Kokotajlo 19 Jun 2020 23:01 UTC
    LW: 2 AF: 1
    0
    AF Parent
    It did. Part of me thought it was better not to comment, but then I figured the entire point of the post was how to do outreach to people we don’t agree with, so I decided it was better to express my frustration.
    Well said. I’m glad you spoke up. Yeah, I don’t want people to rationalize their way into thinking AI should never be developed or released either. Currently I think people are much more likely to make the opposite error, but I agree both errors are worth watching out for.
    I don’t know of a standard reference for that claim either. Here is what I’d say in defense of it:
    --AIXItl was a serious proposal for an “ideal” intelligent agent. I heard the people who came up with it took convincing, but eventually agreed that yes, AIXItl would seize control of its reward function and kill all humans.
    --People proposed Oracle AI, thinking that it would be safe. Now AFAICT people mostly agree that there are various dangers associated with Oracle AI as well.
    --People sometimes said that AI risk arguments were founded on these ideal models of AI as utility maximizers or something, and that they wouldn’t apply to modern ML systems. Well, now we have arguments for why modern ML systems are potentially dangerous too. (Whether these are the same arguments rephrased, or new arguments, is not relevant for this point.)
    --In my personal experience at least, I keep discovering entirely new ways that AI designs could fail, which I hadn’t thought of before. For example, paul’s “The Universal Prior is Malign.” Or oracles outputting self-fulfilling prophecies. Or some false philosophical view on consciousness or something being baked into the AI. This makes me think maybe there are more which I haven’t yet thought of.