What will be in last place in the race toward human simulation—text, image (ie realistic AI-generated video of the human face or voice), or the body? Whichever is in last place would become the privileged marker of biological humanity.
It seems to me that we’re already doing pretty well with AI-generated faces and voices. Probably last place will either be babble quality or robotic body quality.
So an alternative to careful parsing of written text might be simply to insist on hearing words spoken by a human being. Of course, there’s a potential for those words to be an AI-generated script. That doesn’t put us in much of a different place from listening to human-originating babble, though. In fact, we already parse people (like politicians) for whether they sound like they’re just giving us “talking points,” following a loose script, or whether they’re actually speaking off-the-cuff, with authenticity. This is one reason people liked DJT and disliked Clinton, for example. Weirdly enough, since I bet AI will be able to imitate the Donald long before it can copy Clinton’s speaking style.
So count me unconvinced that the babble problem is either a genuinely new issue, or that System-2 careful parsing for deep structure is our only solution.
What will be in last place in the race toward human simulation—text, image (ie realistic AI-generated video of the human face or voice), or the body? Whichever is in last place would become the privileged marker of biological humanity.
It seems to me that we’re already doing pretty well with AI-generated faces and voices. Probably last place will either be babble quality or robotic body quality.
So an alternative to careful parsing of written text might be simply to insist on hearing words spoken by a human being. Of course, there’s a potential for those words to be an AI-generated script. That doesn’t put us in much of a different place from listening to human-originating babble, though. In fact, we already parse people (like politicians) for whether they sound like they’re just giving us “talking points,” following a loose script, or whether they’re actually speaking off-the-cuff, with authenticity. This is one reason people liked DJT and disliked Clinton, for example. Weirdly enough, since I bet AI will be able to imitate the Donald long before it can copy Clinton’s speaking style.
So count me unconvinced that the babble problem is either a genuinely new issue, or that System-2 careful parsing for deep structure is our only solution.