Loose analogy based reasoning over complex and poorly understood systems isn’t reliable. There is kind of only one way for GPT-n to be identical to system 1, and many ways for it to be kind of similar, in a way that is easy to anthropomorphize, but has some subtle alien features.
GPTn contains some data from smart and/or evil humans and humans speaking in riddles or making allusions. Lets suppose this generalizes, and now GPTn is pretending to be an IQ200 cartoon villain, with an evil plot described entirely in terms of references to obscure sources. So when referring to DNA, it says things like “two opposites of the same kind, two twins intertwined, a detectives assistant and an insect did find. Without an alien friend.”
Or maybe it goes full ecologist jargon. It talks about “genetically optimizing species to restore population levels to re-balance ecosystems into a per-anthropogenic equilibrium”. Would an army of minimum wage workers spot that this was talking about wiping out almost all humans? 7
Actually, wouldn’t a naive extrapolation of internet text suggest that superhumanly complicated ideas were likely to come in superhumanly dense jargon?
I mean, if you have a team of linguists and AI experts carefully discussing every sentence, then this particular problem goes away. The sort of operation where, if the AI produces a sentence of Klingon, you are flying in a top Klingon expert before you get the next sentence. But how useful could gpt-n be if used in such a way? On the other extreme, gpt-n is producing internal reasoning text at a terabyte/minute. All you can do with it is grep for some suspicious words, or pass it to another AI model. You can’t even store it for later unless you have a lot of hard drives. Potentially much more useful. And less safe.
Loose analogy based reasoning over complex and poorly understood systems isn’t reliable. There is kind of only one way for GPT-n to be identical to system 1, and many ways for it to be kind of similar, in a way that is easy to anthropomorphize, but has some subtle alien features.
GPTn contains some data from smart and/or evil humans and humans speaking in riddles or making allusions. Lets suppose this generalizes, and now GPTn is pretending to be an IQ200 cartoon villain, with an evil plot described entirely in terms of references to obscure sources. So when referring to DNA, it says things like “two opposites of the same kind, two twins intertwined, a detectives assistant and an insect did find. Without an alien friend.”
Or maybe it goes full ecologist jargon. It talks about “genetically optimizing species to restore population levels to re-balance ecosystems into a per-anthropogenic equilibrium”. Would an army of minimum wage workers spot that this was talking about wiping out almost all humans? 7
Actually, wouldn’t a naive extrapolation of internet text suggest that superhumanly complicated ideas were likely to come in superhumanly dense jargon?
I mean, if you have a team of linguists and AI experts carefully discussing every sentence, then this particular problem goes away. The sort of operation where, if the AI produces a sentence of Klingon, you are flying in a top Klingon expert before you get the next sentence. But how useful could gpt-n be if used in such a way? On the other extreme, gpt-n is producing internal reasoning text at a terabyte/minute. All you can do with it is grep for some suspicious words, or pass it to another AI model. You can’t even store it for later unless you have a lot of hard drives. Potentially much more useful. And less safe.