evhub comments on More variations on pseudo-alignment

evhub 5 Nov 2019 1:44 UTC
LW: 9 AF: 7
0
AF
Perfectly reasonable for you to not reply like you said, though I think it’s worthwhile for me to at least clarify one point:

I don’t think a competent-at-human-level system doesn’t know about deception, and I don’t think a competent-at-below-human-level system can cause extinction-level catastrophe

A model which simply “doesn’t know about deception” isn’t the only (or even the primary) situation I’m imagining. The example I gave in the post was a situation in which the model hadn’t yet “figured out that deception is a good strategy,” which could be:
- because it didn’t know about deception,
- because it thought that deception wouldn’t work,
- because it thought it was fully aligned,
- because the training process constrained its thoughts such that it wasn’t able to even think about deception,
or some other reason. I don’t necessarily want to take a stand on which of these possibilities I think is the most likely, as I think that will vary depending on the training process. Rather, I want to point to the general problem that a lot of these sorts of possibilities exist such that, especially if you expect adversaries in the environment, I think it will be quite difficult to eliminate all of them.
What links here?
- Rohin Shah's comment on More variations on pseudo-alignment by evhub (5 Nov 2019 15:44 UTC; 5 points)
- Rohin Shah 5 Nov 2019 7:59 UTC
  LW: 5 AF: 4
  0
  AF Parent
  Yes, good point. I’d make the same claim with “doesn’t know about deception” replaced by “hasn’t figured out that deception is a good strategy (assuming deception is a good strategy)”.
  - Daniel Kokotajlo 6 Nov 2019 0:22 UTC
    LW: 1 AF: 1
    0
    AF Parent
    Humans who believe in God still haven’t concluded that deception is a good strategy, and they have similar evidence about the non-omnipotence and non-omnibenevolence of God as an AI might have for its creators.
    (Though maybe I’m wrong about this claim—maybe if we ask some believers they would tell us “yeah I am just being good to make it to the next life, where hopefully I’ll have a little more power and freedom and can go buck wild.”)