Rohin Shah comments on DeepMind alignment team opinions on AGI ruin arguments

Rohin Shah 15 Aug 2022 5:57 UTC
28 points
4
I expect lots of alien concepts in domains where AI far surpasses humans (e.g. I expect this to be true of AlphaFold). But if you look at the text of the ruin argument:
Nobody knows what the hell GPT-3 is thinking, not only because the matrices are opaque, but because the stuff within that opaque container is, very likely, incredibly alien—nothing that would translate well into comprehensible human thinking, even if we could see past the giant wall of floating-point numbers to what lay behind.
I think this is pretty questionable. I expect that a good chunk of GPT-3′s cognition is something that could be translated into something comprehensible, mostly because I think humans are really good at language and GPT-3 is only somewhat better on some axes (and worse on others). I don’t remember what I said on this survey but right now I’m feeling like it’s “Unclear”, since I expect lots of AIs to have lots of alien concepts, but I don’t think I expect quite as much alienness as Eliezer seems to expect.
(And this does seem to materially change how difficult you expect alignment to be; on my view you can hope that in addition to all the alien concepts the AI also has regular concepts about “am I doing what my designers want” or “am I deceiving the humans” which you could then hope to extract with interpretability.)