I don’t think the “over-fitting” problem applies to the Turing Test: you can ask the candidate about anything, and adapt your later questions accordingly. There are proofs in computational complexity (that I’m too lazy to look up right now) that show that you can’t pass this kind of test (except with exponentially small probability) but by containing a polynomial-time algorithm for the entire problem space. (It’s related to the question of what problems are IP-complete—i.e. the hardest among those problems that can be quickly solved via interactive proof.)
It would only be analogous to the test of the students if you published a short list of acceptable topics for the TT and limited the questions to that. Which they don’t do.
Edit: If you were right, it would be much easier to construct such a “conversation savant” than it has proven to be.
If you were right, it would be much easier to construct such a “conversation savant” than it has proven to be.
Watson shocked me—I didn’t think that type of performance was possible without AI completeness. That was a type of savant that I thought couldn’t happen before AGI.
It might be that passing for a standard human in a Turing test is actually impossible without AGI—I’m just saying that I would want more proof in the optimised-for-Turing-test situation than in others.
That was a type of savant that I thought couldn’t happen before AGI.
This interests me (as someone professionally involved in the creation of savants, though not linguistic ones). Can you articulate why you thought that?
It wasn’t formalised thinking. I bought into the idea of AI-complete problems, ie that there were certain problems that only a true AI could solve—and that if it could, it could also solve all others. I was also informally thinking that linguistic ability was the queen of all human skills (influenced by the Turing test itself and by the continuous failure of chatterbots). Finally, I wasn’t cognisant of the possibilities of Big Data to solve these narrow problems by (clever) brute force. So I had the image of a true AI being defined by the ability to demonstrate human-like ability on linguistic problems.
The game Watson was playing was non-interactive[1] -- that is, unlike with the TT, you could not change the later Jeopardy questions, based on Watson’s answers, in an attempt to make it fail.
Had they done so, that would have forced an exponential blowup in the (already large) amount it would have to learn to get the same rate of correct answers.
(Not that humans would have done better in that case, of course!)
Interactivity makes a huge difference because you can focus away from its strong points and onto its weak points, thus forcing all points to be strong in order to pass.
[1] “non-adaptive” may be a more appropriate term in this context, but I say “interactive” because of the relevance of theorems about the IP complexity class (and PSPACE, which is equal).
I’m not saying that Watson could pass, or almost pass a Turing test. I’m saying that Watson demonstrated a combination of great quasi-linguistic skill and great general incompetence that I wasn’t expecting to be possible. It proved that a computer could be “taught to the test” in at least some areas.
So I think we should keep open the possibility that a computer could be taught to the Turing test as well.
Well, yes, if you make the test non-adaptive, it’s (exponentially) easier to pass. For example, if you limit the “conversation” to a game of chess, it’s already possible. But those aren’t the “full” Turing Test; they’re domain-specific variants. Your criticism would only apply to the latter.
Are AI players actually indistinguishable from humans in Chess? Could an interrogator not pick out consistent stylistic differences between equally-ranked human and AI players?
It would only be analogous to the test of the students if you published a short list of acceptable topics for the TT and limited the questions to that. Which they don’t do.
Actually, don’t they currently limit conversations to a preselected topic? And still the chatbots fail.
I’m not really sure what you’re driving at here. We don’t have any software even close to being able to pass the TT right now; at the moment, using relatively easy subsets of the TT is the most useful thing to do. That doesn’t mean that anyone expects that passing such a subset counts as passing the general TT.
I was just noting that current “Turing Tests” are exactly what was being used as an example of something-that-is-not-a-Turing-test. It’s mildly ironic, that’s all.
I don’t think the “over-fitting” problem applies to the Turing Test: you can ask the candidate about anything, and adapt your later questions accordingly. There are proofs in computational complexity (that I’m too lazy to look up right now) that show that you can’t pass this kind of test (except with exponentially small probability) but by containing a polynomial-time algorithm for the entire problem space. (It’s related to the question of what problems are IP-complete—i.e. the hardest among those problems that can be quickly solved via interactive proof.)
It would only be analogous to the test of the students if you published a short list of acceptable topics for the TT and limited the questions to that. Which they don’t do.
Edit: If you were right, it would be much easier to construct such a “conversation savant” than it has proven to be.
Watson shocked me—I didn’t think that type of performance was possible without AI completeness. That was a type of savant that I thought couldn’t happen before AGI.
It might be that passing for a standard human in a Turing test is actually impossible without AGI—I’m just saying that I would want more proof in the optimised-for-Turing-test situation than in others.
This interests me (as someone professionally involved in the creation of savants, though not linguistic ones). Can you articulate why you thought that?
It wasn’t formalised thinking. I bought into the idea of AI-complete problems, ie that there were certain problems that only a true AI could solve—and that if it could, it could also solve all others. I was also informally thinking that linguistic ability was the queen of all human skills (influenced by the Turing test itself and by the continuous failure of chatterbots). Finally, I wasn’t cognisant of the possibilities of Big Data to solve these narrow problems by (clever) brute force. So I had the image of a true AI being defined by the ability to demonstrate human-like ability on linguistic problems.
The game Watson was playing was non-interactive[1] -- that is, unlike with the TT, you could not change the later Jeopardy questions, based on Watson’s answers, in an attempt to make it fail.
Had they done so, that would have forced an exponential blowup in the (already large) amount it would have to learn to get the same rate of correct answers.
(Not that humans would have done better in that case, of course!)
Interactivity makes a huge difference because you can focus away from its strong points and onto its weak points, thus forcing all points to be strong in order to pass.
[1] “non-adaptive” may be a more appropriate term in this context, but I say “interactive” because of the relevance of theorems about the IP complexity class (and PSPACE, which is equal).
I’m not saying that Watson could pass, or almost pass a Turing test. I’m saying that Watson demonstrated a combination of great quasi-linguistic skill and great general incompetence that I wasn’t expecting to be possible. It proved that a computer could be “taught to the test” in at least some areas.
So I think we should keep open the possibility that a computer could be taught to the Turing test as well.
Well, yes, if you make the test non-adaptive, it’s (exponentially) easier to pass. For example, if you limit the “conversation” to a game of chess, it’s already possible. But those aren’t the “full” Turing Test; they’re domain-specific variants. Your criticism would only apply to the latter.
Are AI players actually indistinguishable from humans in Chess? Could an interrogator not pick out consistent stylistic differences between equally-ranked human and AI players?
Actually, don’t they currently limit conversations to a preselected topic? And still the chatbots fail.
I’m not really sure what you’re driving at here. We don’t have any software even close to being able to pass the TT right now; at the moment, using relatively easy subsets of the TT is the most useful thing to do. That doesn’t mean that anyone expects that passing such a subset counts as passing the general TT.
I was just noting that current “Turing Tests” are exactly what was being used as an example of something-that-is-not-a-Turing-test. It’s mildly ironic, that’s all.