General intelligence test: no domains of stupidity

Stuart_Armstrong21 May 2013 16:04 UTC

15 points

It’s been a productive conversation on my post criticising the Turing test. I claimed that I wouldn’t take the Turing test as definitive evidence of general intelligence if the agent was specifically optimised on the test. I was challenged as to whether I had a different definition of thinking than “able to pass the Turing test”. As a consequence of that exchange, I think I do.

Truly general intelligence is impossible, because of various “no free lunch” theorems, that demonstrate that no algorithm can perform well in every environment (intuitively, this makes sense: a smarter being could always design an environment that specifically penalises a particular algorithm). Nevertheless, we have the intuitive definition of a general intelligence as one that performs well in most (or almost all) environments.

I’d like to reverse that definition, and define a general intelligence as one that doesn’t perform stupidly in a novel environment. A small change of emphasis, but it gets to the heart of what the Turing test is meant to do, and why I questioned it. The idea of the Turing test is to catch the (putative) AGI performing stupidly. Since we can’t test the AGI on every environment, the idea is to have the Turing test be as general as possible in potential. If you give me the questions in advance, I can certainly craft an algorithm that aces that test; similarly, you can construct an AGI that would ace any given Turing test. But since the space of reasonable conversations is combinatorially huge, and since the judge could potentially pick any element from within that, the AGI could not just have a narrow list of responses: it would have to be genuinely generally intelligent, so that it would not end up being stupid on the particular conversation it was in.

That’s the theory, anyway. But maybe the space of conversations isn’t as vast as all that, especially if the AGI has some simple classification algorithms. Maybe the data on the internet today, combined with some reasonably cunning algorithms, can carry a conversation as well as a human. After all, we are generating examples of conversations by the millions every hour of every day.

Which is why I emphasised testing from outside the domain of competence of the AGI. You need to introduce it to a novel environment, and give it the possibility of being stupid. If the space of human conversations isn’t large enough, you need to move to the much larger space of real-world problem solving—and pick something from it. It doesn’t matter what it is, simply that you have the potential of picking anything. Hence only a general intelligence could be confident, in advance, of coping with it. That’s why I emphasised not saying what your test was going to be, and changing the rules or outright cheating: the less restrictions you allow on the potential test, the more informative the actual test is.

A related question, of course, is whether humans are generally intelligent. Well, humans are stupid in a lot of domains. Human groups augmented by data and computing technology, and given enough time, are much more generally intelligent that individual humans. So general intelligence is a matter of degree, not a binary classification (though it might be nearly binary for some AGI designs). Thus whether you call humans generally intelligent is a matter of taste and emphasis.

Stuart_Armstrong21 May 2013 16:04 UTC

15 points

36 comments2 min readLW link Archive

OrphanWilde 21 May 2013 17:29 UTC
9 points
Put a human being in an environment which is novel to them. Say, empiricism doesn’t hold—the laws of this environment are such that “That which has happened before is less likely to happen again” (a reference to an old Overcoming Bias post I can’t locate).

Is that human being going to behave “stupidly” in this environment? Do -we- fail the intelligence test? You acknowledge that we could—but if you’re defining intelligence in such a way that nothing actually satisfies that definition, what the heck are you achieving, here?

I’m not sure your criteria is all that useful. (And I’m not even sure it’s that well defined, actually.)
- magfrump 22 May 2013 5:12 UTC
  5 points
  Parent
  People fail at novel environments as mundane as needing to find a typo in an html file or paying attention to fact-checks during political debates. You don’t have to come up with extreme philosophical counterexamples to find domains in which it’s interesting to distinguish between the behavior of different non-experts (and such that these differences feel like “intelligence”).
- Stuart_Armstrong 22 May 2013 15:07 UTC
  3 points
  Parent
  
  but if you’re defining intelligence in such a way that nothing actually satisfies that definition, what the heck are you achieving, here?
  
  The no free lunch theorems imply there’s no such thing as a universal definition of general intelligence. So I think general intelligence should be a matter of degree, rather than kind.
  
  I’m not sure your criteria is all that useful. (And I’m not even sure it’s that well defined, actually.)
  
  It’s not well defined, yet. But I think there’s a germ of a good idea there, that I’m teasing out with the help of commenters here.
- jmmcd 21 May 2013 18:47 UTC
  1 point
  Parent
  
  “That which has happened before is less likely to happen again” (a reference to an old Overcoming Bias post I can’t locate).
  
  Good point. In fact, that is the type of environment which is required for the No Free Lunch theorems mentioned in the post to even be relevant. A typical interpretation in the evolutionary computing field would be that it’s the type of environment where an anti-GA (a genetic algorithm which selects individuals with worse fitness) does better than a GA. There are good reasons to say that such environments can’t occur for important classes of problems typically tackled by EC. In the context of this post, I wonder whether such an environment is even physically realisable.
  
  (I think a lot of people misinterpret NFL theorems.)
- MugaSofer 29 May 2013 10:21 UTC
  0 points
  Parent
  
  Say, empiricism doesn’t hold—the laws of this environment are such that “That which has happened before is less likely to happen again” (a reference to an old Overcoming Bias post I can’t locate).
  
  Then we would observe this, and update on it—after all, this mysterious law is presumably immune to itself, or it would have stopped by now,right?
  - OrphanWilde 29 May 2013 13:18 UTC
    0 points
    Parent
    I’m curious to know how you expect Bayesian updates to work in a universe in which empiricism doesn’t hold. (I’m not denying it’s possible, I just can’t figure out what information you could actually maintain about the universe.)
    - MugaSofer 30 May 2013 10:19 UTC
      1 point
      Parent
      If things have always been less likely after they happened in the past, then, conditioning on that, something happening is Bayesian evidence that it wont happen again.
    - Creutzer 29 May 2013 20:25 UTC
      1 point
      Parent
      What exactly do you mean by “empiricism does not hold”? Do you mean that there are no laws governing reality? Is that even a thinkable notion? I’m not sure. Or perhaps you mean that everything is probabilistically independent from everything else. Then no update would ever change the probability distribution of any variable except the one on whose value we update, but that is something we could notice. We just couldn’t make any effective predictions on that basis—and we would know that.
Kawoomba 21 May 2013 16:31 UTC
8 points
One oft-neglected aspect of a Turing Test is the other guy—the human whom you need to distinguish from the machine. The usual tricks of the trade include e.g. asking a question such as “If you could pose a question to a turing test candidate, what would it be?”, which supposedly confuses a non-AGI, but not a hew-mon.

However, have you ever asked your gramps such a question? A barely literate goatherder in Pakistan? And in writing, no less.

The same applies when extending the Turing Test to other problem domains. The goal isn’t to tell apart Pandora from her box, but a plain ol’ average “Is this thing on now, dear?” homo barely sapiens.
- ESRogs 21 May 2013 19:44 UTC
  3 points
  Parent
  It sounds like you’re pointing out that people often overestimate the difficulty of passing the Turing Test. Is that what you mean to say?
  
  Do you think the Turing Test (at the difficulty level you describe) is a reasonable test of intelligence?
  - Kawoomba 21 May 2013 20:12 UTC
    8 points
    Parent
    
    It sounds like you’re pointing out that people often overestimate the difficulty of passing the Turing Test. Is that what you mean to say?
    
    Yes. I think the Turing Test is useful, but that there are too many quite distinct tests mapping to “Turing Test”, and details matter. College students as volunteers will lead to markedly different results than a randomly iid drawn human from anywhere on Earth.
    
    As is the case so often, many disagreements I’ve seen boil down to (usually unrecognized) definitional squabbles. Without clarification, the statement “A Turing Test is a reasonable test for intelligence” just isn’t well defined enough. Which Turing Test? Reasonable in terms of optimality, in terms of feasability, or in what way? Intelligence in some LW “optimizing power above certain threshold” sense (if so, what threshold?), or some other notion?
    
    You thankfully narrowed it down to the specific Turing version I mentioned, but in truth I don’t have only one concept of intelligence I find useful, in the sense of that I can see various concepts of intelligence being useful in different contexts. I pay no special homage to “intelligence1” over “intelligence2″. Concerning this discussion:
    
    I think that human-level intelligence—and the Turing Test is invariably centered on humans as the benchmark—shouldn’t be defined by educated gifted people, but by an average. An “average human” Turing Test being passed is surely interesting, not least from a historical perspective. However, it’s not clear whether such an algorithm would be powerful enough to foom, or to do that many theoretically interesting tasks. Many less privileged humans can’t do that many interesting tasks better than machines, apart from recognizing tanks and cats on pictures.
    
    So should we focus on a Turing Test tuned to an AGI on par with fooling the best researchers into believing it to be a fellow AI researcher? Maybe, although if we had a “winner”, we’d probably know just by looking out the window, before we even set up the test (or we’d know by the AI looking in …).
    
    All considered, I’d focus on a Turing Test which can fool average humans in the civilized world, which seems to be the lowest Turing Test level at which such a chatbot would have a transformative influence on social human interactions.
    - jmmcd 21 May 2013 21:01 UTC
      7 points
      Parent
      Don’t forget that the goal in the Turing Test is not to appear intelligent, but to appear human. If an interrogator asks “what question would you ask in the Turing test?”, and the answer is “uh, I don’t know”, then that is perfectly consistent with the responder being human. A smart interrogator won’t jump to a conclusion.
    - ESRogs 21 May 2013 22:05 UTC
      2 points
      Parent
      Thank you, that was a fantastic answer to my questions (and more)!
shminux 21 May 2013 18:20 UTC
5 points
On the missing definition of “being stupid”: I presume it means something like “failing to achieve a substantial portion of its explicit goals”. In your example the goal is to convince the other side of being human-like. Is this a reasonable definition?
PhilGoetz 24 May 2013 18:12 UTC
3 points
I don’t read the Turing test as being designed to prove “intelligence” at all. It’s an assertion of materialism. People who oppose the Turing test say that the Turing test is useless because it only tests behavior, while what really makes a person a person is some magic inside them. John Searle says we can deduce via thought experiment that humans must have some “consciousness stuff” inside them, a type of matter that makes them conscious by its physical composition, that is conscious the way water is wet. At a lecture of his, I tried to get him to answer the question of what he would say if a computer passed the Turing test, but he dodged the question. To be consistent with his writings (which is not entirely possible, as they are not internally consistent) he would have to say that passing the Turing test means nothing at all, since his Chinese room posits a computer that passes the Turing test yet is “not intelligent”.

Getting someone to accept the Turing test as a test of intelligence is a sneaky way of getting them to accept it as a test for consciousness / personhood. It’s especially sneaky because failing the Turing test is not really their final objection to the claim that a computer can be conscious, but people will commit themselves to “Computers can’t pass the Turing test” because they believe it. After computers pass the Turing test, it will be harder for people who made a big deal out of the Turing test to admit that they don’t care whether a computer passed it or not, they still want to keep it as their slave.
- Stuart_Armstrong 27 May 2013 17:13 UTC
  2 points
  Parent
  I agree with your point, but that’s about the uses of the Turing test, not about how good it actually is.
Vaniver 21 May 2013 21:16 UTC
3 points
I don’t think this captures the fundamental natures of intelligence, and I think others are right to throw an error at the word “stupidly.”

Suppose there is some cognitive faculty, which we’ll call adaptability, which agents have. When presented with a novel environment (i.e. sense data, set of possible actions, and consequences of those actions if taken), more adaptable agents will more rapidly choose actions with positive consequences.

Suppose there is some other cognitive faculty, which we’ll call knowledge, which agents also have. This is a characterization of the breadth of environments to which they have adapted, and how well they have adapted to them.

Designing an agent with specific knowledge requires adaptability on the part of the designer; designing an agent with high adaptability requires adaptability on the part of the agent. Your general criticism seems to be “an agent can become knowledgeable about human conversations carried out over text channels with little adaptability of its own, and thus that is not a good test of adaptability.”

I would agree: a GLUT written in stone, which is not adaptable at all, could still contain all the knowledge necessary to pass the Turing test. An adaptable algorithm could pass the Turing Test, but only after consuming a sample set containing thousands of conversations and millions of words and then participating in those conversations itself. After all, that’s how I learned to speak English.

Perhaps there is an optimal learner that we can compare agents against. But communication has finite information transfer, and the bandwidth varies significantly; the quality of the instruction (or the match between the instruction and the learner) should be part of the test. Even exploration is an environment where knowledge can help, especially if the exploration is in a field linked to reality. (Indeed, it’s not clear that humans are adaptable to anything, and so the binary “adaptable or not?” makes as much sense as an “intelligent or not?”.)

These two faculties suggest different thresholds for AI: an AI can eat the jobs of knowledge workers once it has their knowledge, and an AGI can eat the job of creating knowledge workers once it has adaptability.

(Here I used two clusters of cognitive faculties, but I think the DIKW pyramid is also relevant.)
Lumifer 21 May 2013 16:53 UTC
3 points

define a general intelligence as one that doesn’t perform stupidly in a novel environment

What does “stupidly” mean in this context?

It also seems to me you’re setting up a very high bar, one that you yourself admit (individual) humans generally can’t reach. If they can’t, why set it so high, then? Since we can’t get to even human-level intelligence at the moment, there doesn’t seem to much sense in speculating about designing even harder tests for AIs.
DanArmak 24 May 2013 11:52 UTC
1 point

If the space of human conversations isn’t large enough, you need to move to the much larger space of real-world problem solving

You can discuss real-world problem solving in a conversation. As such, doesn’t the Turing test already encompass this?
- Stuart_Armstrong 24 May 2013 12:21 UTC
  2 points
  Parent
  No, because there is a difference between describing an action as well as a human could, and performing that action as well as a human could. For many actions (eg 3d motion), the first is much easier than the second.
  - DanArmak 24 May 2013 12:35 UTC
    3 points
    Parent
    The AI can’t perform human-like motions because it doesn’t have a human-like body, but the test isn’t supposed to penalize it for that. That’s why the test is done through text-only chat and not in person.
    
    If we limit ourselves to actions that can be remotely controlled via a text-like low-bandwidth interface, such actions can be described or simulated as part of the test. This doesn’t include everything humans do, certainly, but neither can humans do everything an AI does. I think an AI that can pass such a test is for all intents and purposes a human-equivalent AGI.
    - Stuart_Armstrong 27 May 2013 17:15 UTC
      0 points
      Parent
      
      The AI can’t perform human-like motions because it doesn’t have a human-like body, but the test isn’t supposed to penalize it for that. That’s why the test is done through text-only chat and not in person.
      
      Upload a video and have it identify the puppy. You don’t need a body to do that.
      - DanArmak 27 May 2013 17:18 UTC
        1 point
        Parent
        Of course—that’s what I meant. I was responding to your words,
        
        For many actions (eg 3d motion), the first is much easier than the second.
        
        And by “3d motion” I thought you meant the way humans can instinctively move their own bodies to throw or catch a ball, but can’t explicitly solve the equations that define its flight.
        Stuart_Armstrong 27 May 2013 17:49 UTC
        0 points
        Parent
        If a language-optimised AI could control manipulators well enough to catch balls, that would indeed be huge evidence of general intelligence (maybe send them a joystick with a usb port overide—the human grasps the joystick, the AI controls it electronically).
        DanArmak 27 May 2013 21:02 UTC
        4 points
        Parent
        Given a 3d world model, predicting the ball’s trajectory and finding an intercept point is very simple for a computer. The challenge is to turn sensory data into a suitable world model. I think there are already narrow AIs which can do this.
        
        But this seems unrelated to speech production or recognition, or the other abilities needed to pass a classic Turing test. I think any AI that could pass a pure-language Turing test, could have such a narrow AI bolted on.
        
        It seems likely to me (although I am not an expert or even a well informed layman) that almost any human-built AI design will have many modules dedicated to specific important tasks, including visual recognition and a 3d world model that can predict simple movement. It wouldn’t actually solve such problems using its general intelligence (or its language modules) from first principles.
        
        But again, this is just speculation on my part.
        Stuart_Armstrong 27 May 2013 21:19 UTC
        0 points
        Parent
        
        I think any AI that could pass a pure-language Turing test, could have such a narrow AI bolted on.
        
        That’s precisely why the origin of the AI is so important—it’s only if the general AI developed these skills without bolt-ons, that we can be sure it’s a real general intelligence.
        DanArmak 27 May 2013 21:38 UTC
        1 point
        Parent
        That’s a sufficient condition, but I don’t think it’s a necessary one—it’s not only then that we’ll know it has real GI (general intelligence). For instance it might have had, or adapted, narrow modules for those particular purposes before its GI became powerful enough.
        
        Also, human GI is barely powerful enough to write the algorithms for new modules like that. In some areas we still haven’t succeeded; in others it took us hundreds of person-years of R&D. Humans are an example that with good enough narrow modules, the GI part doesn’t have to be… well, superhumanly intelligent.
        Stuart_Armstrong 28 May 2013 12:23 UTC
        0 points
        Parent
        Yes—my test criteria are unfair to the AI (arguably the Turing test is as well). I can’t think of methods that have good specificity as well as sensitivity.
        Eugine_Nier 28 May 2013 1:55 UTC
        −1 points
        Parent
        On the other hand, we’re perfectly capable of acquiring skills that we didn’t evolve to possess, e.g., flying planes.
        DanArmak 28 May 2013 7:10 UTC
        1 point
        Parent
        We do have a general intelligence. Without it we’d be just smart chimps.
        
        But in most fields where we have a dedicated module—visual recognition, spatial modeling, controlling our bodies, speech recognition and processing and creation—our GI couldn’t begin to replace it. And we haven’t been able to easily create equivalent algorithms (and the problems aren’t just computing power).
syllogism 22 May 2013 5:48 UTC
1 point
Related thought: is a conversation bot crudely optimised to game the Turing test smarter than a labrador or a dolphin?
- AlexMennen 23 May 2013 2:14 UTC
  6 points
  Parent
  I don’t think Alex the parrot could have passed the Turing test, but his use of language was rather impressive. Given that such a conversation bot would be an extreme specialist, parrot intelligence is much more general, and the bot would be only slightly better at its specialty than Alex the parrot is, I’d say it’s highly unlikely that such a conversation bot would be smarter than the average parrot (which I assume to be only slightly less intelligent than Alex) by any reasonable measure of intelligence. Same goes for dolphins, and probably even for labradors.
NoSignalNoNoise 22 May 2013 4:39 UTC
0 points
Perhaps the Turing test would work better if instead of having to pass for a human, the bot’s insightfulness were rated and it had to be at the same level as a human’s. Insightfulness seems harder to fake than “sounds superficially like a human” and it’s what we care about anyway.

As a plus, it will make it easier for autistic people to pass the Turing test.
- HungryHobo 22 May 2013 12:55 UTC
  7 points
  Parent
  How do they “fail”
  
  If autistic people get classed as non-humans then that’s a failure on the part of the assessing human beings and merely forms part of the baseline to which you are comparing the machines.
  
  The humans are your control so that you can’t set silly standards for the machines. The humans can’t fail any more than the control rats in a drug trial can fail.
- ChristianKl 23 May 2013 23:16 UTC
  0 points
  Parent
  If you think insightfullness is a good way to test AI’s then the person having the conversation with the AI can just say: “Hey, do you have an insight that you can share on X?”
  - NoSignalNoNoise 24 May 2013 3:16 UTC
    2 points
    Parent
    Yes, but in the standard Turing test, the AI is then judged on how human it seems, not how insightful.