Hi, everyone. My name is Teresa, and I came to Less Wrong by way of HPMOR.
I read the first dozen chapters of HPMOR without having read or seen the Harry Potter canon, but once I was hooked on the former, it became necessary to see all the movies and then read all the books in order to get the HPMOR jokes. JK Rowling actually earned royalties she would never have received otherwise thanks to HPMOR.
I don’t actually identify as a pure rationalist, although I started out that way many, many years ago. What I am committed to today is SANITY. I learned the hard way that, in my case at least, it is the body that keeps the mind sane. Without embodiment to ground meaning, you get into problems of unsearchable infinite regress, and you can easily hypothesize internally consistent worlds that are nevertheless not the real world the body lives in. This can lead to religions and other serious delusions.
That said, however, I find a lot of utility in thinking through the material on this site. I discovered Bayesian decision theory in high school, but the texts I read at the time either didn’t explain the whole theory or else I didn’t catch it all at age 14. Either way, it was just a cute trick for calculating compound utility scores based on guesses of likelihood for various contingencies. The greatest service the Less Wrong site has done for me is to connect the utility calculation method to EMPIRICAL prior probabilities! Like, duh! A hugely useful tool, that is.
As a professional writer in my day job and student of applied linguistics research otherwise, I have some reservations about those of the Sequences that reference the philosophy of language. I completely agree that Searle believes in magic (aka “intentionality”), which is not useful. But this does not mean the Chinese Room problem isn’t real.
When you study human language use empirically in natural contexts (through frame-by-frame analysis of video recordings), it turns out that what we think we do with language and what we actually do are rather divergent. The body and places in the world and other agents in the interaction all play a much bigger role in the real-time construction of meaning than you would expect from introspection. Egocentric bias has a HUGE impact on what we imagine about our own utterances. I’ve come to the conclusion that Stevan Harnad is absolutely correct, and that machine language understanding will require an AI ROBOT, not a disembodied algorithmic system.
As for HPMOR, I hereby predict that Harrymort is going to go back in time to the primal event in Godric’s Hollow and change the entire universe to canon in his quest to, er, spoilers, can’t say.
The chief deficiency of embodiment philosophy-of-mind, at least among AIers and cognitivists, is that they constantly say “embodiment” when they should say “experience of embodiment”. And when you put it that way, most of the magic leaches away and you’re left facing the same old hard problem of consciousness. Meaning, understanding, intentionality are all aspects of consciousness. And various studies can show that body awareness is surprisingly important in the genesis and constitution of those things. But just having a material object governed by a hierarchy of feedback loops does not explain why there should be anyone home in that object—why there should be any form of awareness in, or around, or otherwise associated with that object.
I sort of agree with you: if the “hard problem of consciousness” is indeed a coherent problem that needs to be solved, then what you say makes perfect sense. But I am not convinced that it’s a problem worth solving. I don’t care whether Mitchell_Porter is an entity that really, truly experiences consciousness, or whether it’s only a “material object governed by a hierarchy of feedback loops”, so long as Mitchell_Porter has interesting things to say, and can hold up his/her/its own end of the conversation.
Let’s distinguish between superficial and fundamental ignorance. If you flip a coin, you may not know which way it came up until you look. This typifies what I will call superficial ignorance. The mechanics of a flat disk of metal, sent spinning in a certain way, is not an especially mysterious subject. Your ignorance of whether the coin shows head or tails does not imply ignorance of the essence of what just happened.
Fundamental ignorance is where you really don’t know what’s going on. The sun goes up and down in the sky and you don’t know why, for a third of each day you’re in some other reality where you don’t remember the usual one, and so on. The situation with respect to consciousness is in this category.
It could be argued that you should care about any instance of fundamental ignorance, because its implications are unknown in a way that the implications of superficial ignorance are not. Who knows what further wonderful, terrible, or important facts it obscures? Then again, it could be argued that there’s fundamental ignorance beneath every instance of superficial ignorance. Consider the spinning coin: we have a physical mechanics that can describe its motion: but why does that mechanics work?
Conversely, in the case of consciousness, there’s an argument for complacency: I may not understand why brains are conscious, but human beings pretty consistently act in the ways that I tentatively regard as indicative of consciousness, and (I could say) in my dealing with them, it’s how they behave which matters.
There are a few further reasons why someone may end up caring whether other people/beings are truly conscious or not. One is morality. I may consider it important to know (if only I could know), whether they really are happy or suffering, or whether they are just automata pantomiming the behaviors of happiness and suffering. Another is intellectual curiosity. Perhaps you just decide that you want to know, not because of the argument from the unknown significance of fundamental ignorance, but on a whim, or because of the cool satisfaction of grasping something abstract.
But perhaps the number-one reason that someone from this community should want to know, is that many people here anticipate that they personally will undergo transformations such as mind uploading. If you at least value your own consciousness, and not just your behaviors, then you have an interest in understanding whether a given transformation preserves consciousness or not.
I think that you are unintentionally conflating two very different questions:
1). What is the mechanism that causes us to perceive certain entities, including humans, as possessing consciousness ? 2). Let’s assume that there’s a hidden factor, called “consciousness”, that is sufficient but not necessary to cause us to perceive humans as being conscious. How can we test for the presence or absence of this factor ?
Answering (2) may help you answer (1), but (2) is unanswerable if the assumption you are making in it is wrong.
I personally see no reason to postulate the presence of some hidden, undetectable factor that causes humans to be conscious. I would love to know how is it exactly that human brains produce the phenomenon we perceive as “consciousness”, but I’m not convinced that such a feature could only have a single possible implementation.
This is indeed important with respect to morality:
I may consider it important to know (if only I could know), whether they really are happy or suffering, or whether they are just automata pantomiming the behaviors of happiness and suffering.
If the presence of consciousness is unfalsifiable, then you can’t know, and you’re obligated to treat all entities that appear to be happy or suffering equally (for the purposes of making your moral decisions, that is). On the other hand, if the presence of consciousness is falsifiable, then tell me how I can falsify it. If you hand-wave the answer by saying, “oh, it’s a hard problem”, then you don’t have a useful model, you’ve got something akin to Vitalism. It’d be like saying,
“Some suns are powered by fusion, and others are powered by undetectable sun-goblins that make it look like the sun is powered by fusion. Our own sun is powered by goblins. You can’t ever detect them, but trust me, they’re there”.
Would it be appropriate to say that superficial ignorance is factual (one does not know the particular inputs to the equations which govern the coin’s movement) where fundamental ignorance is conceptual (one does not have a concept that the coin is governed by equations of motion)?
You defect in the Prisoner’s Dilemma against a rock with “defect” written on it, defect in the PD against a rock with “cooperate” written on it, and cooperate in the PD against a copy of yourself. So, if you’re ever playing PD against Mitchell_Porter, you want to know whether he’s more like a rock or like yourself.
Right, but in order to figure out whether to cooperate with or defect against Mitchell_Porter, all I need to know is what strategy he is most likely to pursue. I don’t need to know whether he’s a “material object governed by a hierarchy of feedback loops” or a biological human possessed of “consciousness” or an animatronic garden gnome; I just need to know enough to find out which button he’ll press.
I’ve come to the conclusion that Stevan Harnad is absolutely correct, and that machine language understanding will require an AI ROBOT, not a disembodied algorithmic system.
I am not familiar with Stevan Harnad, but this sounds counterintuitive to me (though it’s very likely that I’m misunderstanding your point). I am currently reading your words on the screen. I can’t hear you or see your body language. And yet, I can still understand what you wrote (not fully, perhaps, but enough to ask you questions about it). In our current situation, I’m not too different from a software program that is receiving the text via some input stream, so I don’t see an a priori reason why such a program could not understand the text as well as I do.
I assume telms is referring to embodied cognition, the idea that your ability to communicate with her, and achieve mutual understanding of any sort, is made possible by shared concepts and mental structures which can only arise in an “embodied” mind.
I am rather skeptical about this thesis as far as artificial minds go; somewhat less skeptical about it if applied only to “natural” (i.e., evolved) minds — although in that case it’s almost trivial; but in any case don’t know enough about it to have a fully informed opinion.
Oh, ok, that makes more sense. As far as I understand, the idea behind embodied cognition is that intelligent minds must have a physical body with a rich set of sensors and effectors in order to develop; but once they’re done with their development, they can read text off of the screen instead of talking.
That definitely makes sense in case of us biological humans, but just like you, I’m skeptical that the thesis applies to all possible minds at all times.
I skimmed both papers, and found them unconvincing. Granted, I am not a philosopher, so it’s likely that I’m missing something, but still:
In the first paper, Harnad argues that rule-based expert systems cannot be used to build a Strong AI; I completely agree. He further argues that merely building a system out of neural networks does not guarantee that it will grow to be a Strong AI either; again, we’re on the same page so far. He further points out that, currently, nothing even resembling Strong AI exists anywhere. No argument there.
Harnad totally loses me, however, when he begins talking about “meaning” as though that were some separate entity to which “symbols” are attached. He keeps contrasting mere “symbol manipulation” with true understanding of “meaning”, but he never explains how we could tell one from the other.
In the second paper, Harnad basically falls into the same trap as Searle. He lampoons the “System Reply” by calling it things like “a predictable piece of hand-waving”—but that’s just name-calling, not an argument. Why precisely is Harnad (or Searle) so convinced that the Chinese Room as a whole does not understand Chinese ? Sure, the man inside doesn’t understand Chinese, but that’s like saying that a car cannot drive uphill at 70 mph because no human driver can run uphill that fast.
The rest of his paper amounts to a moving of the goalposts. Harnad is basically saying, “Ok, let’s say we have an AI that can pass the TT via teletype. But that’s not enough ! It also needs to pass the TTT ! And if it passes that, then the TTTT ! And then maybe the TTTTT !” Meanwhile, Harnad himself is reading articles off his screen which were published by other philosophers, and somehow he never requires them to pass the TTTT before he takes their writings seriously.
Don’t get me wrong, it is entirely possible that the only way to develop a Strong AI is to embody it in the physical world, and that no simulation, no matter how realistic, will suffice. I am open to being convinced, but the papers you linked are not convincing. I’m not interested in figuring out whether any given person who appears to speak English really, truly understands English; or whether this person is merely mimicking a perfect understanding of English. I’d rather listen to what such a person has to say.
Why precisely is Harnad (or Searle) so convinced that the Chinese Room as a whole does not understand Chinese ?
Haven’t read the Harnad paper yet, but the reason Searle’s convinced seems obvious to me: he just doesn’t take his own scenario seriously — seriously enough to really imagine it, rather than just treating it as a piece of absurd fantasy. In other words, he does what Dennett calls “mistaking a failure of imagination for an insight into necessity”.
In The Mind’s Eye, Dennett and Hofstadter give the Chinese Room scenario a much more serious fictional treatment, and show in great detail what elements of it trigger Searle’s intuitions on the matter, as well as how to tweak those intuitions in various ways. Sadly but predictably, Searle has never (to my knowledge) responded to their dissection of his views.
Having now read the second linked Harnad paper, my evaluation is similar to yours. Some more specific comments follow.
Harnad talks a lot about whether a body “has a mind”: whether a Turing Test could show if a body “has a mind”, how we know a body “has a mind”, etc.
What on earth does he mean by “mind”? Not… the same thing that most of us here at LessWrong mean by it, I should think.
He also refers to artificial intelligence as “computer models”. Either he is using “model” quite strangely as well… or he has some… very confused ideas about AI. (Actually, very confused ideas about computers in general is, in my experience, endemic among the philosopher population. It’s really rather distressing.)
Searle has shown that a mindless symbol-manipulator could pass the [Turing Test] undetected.
This has surely got to be one of the most ludicrous pronouncements I’ve ever seen a philosopher make.
people can do a lot more than just communicating verbally by teletype. They can recognize and identify and manipulate and describe real objects, events and states of affairs in the world. [italics added]
One of these things is not like the others...
Similar arguments can be made against behavioral “modularity”: It is unlikely that our chess-playing capacity constitutes an autonomous functional module, independent of our capacity to see, move, manipulate, reason, and perhaps even to speak.
Well, maybe our chess-playing module is not autonomous, but as we have seen, we can certainly build a chess-playing module that has absolutely no capacity to see, move, manipulate, or speak.
Most of the rest of the paper is nonsensical, groundless handwaving, in the vein of Searle but worse. I am unimpressed.
Yeah, I think that’s the main problem with pretty much the entire Searle camp. As far as I can tell, if they do mean anything by the word “mind”, then it’s “you know, that thing that makes us different from machines”. So, we are different from AIs because we are different from AIs. It’s obvious when you put it that way !
Well, I certainly agree that there are important aspects of human languages that come out of our experience of being embodied in particular ways, and that without some sort of model that embeds the results of that kind of experience we’re not going to get very far in automating the understanding of human language.
But it sounds like you’re suggesting that it’s not possible to construct such a model within a “disembodied” algorithmic system, and I’m not sure why that should be true.
Then again, I’m not really sure what precisely is meant here by “disembodied algorithmic system” or “ROBOT”.
For example, is a computer executing a software emulation of a humanoid body interacting with an emulated physical environment a disembodied algorithmic system, or an AI ROBOT (or neither, or both, or it depends on something)? How would I tell, for a given computer, which kind of thing it was (if either)?
Is a computer executing a software emulation of a humanoid body interacting with an emulated physical environment a disembodied algorithmic system, or an AI ROBOT (or neither, or both, or it depends on something)?
An emulated body in an emulated environment is a disembodied algorithmic system in my terminology. The classic example is Terry Winograd’s SHRDLU, which made significant advances in machine language understanding by adding an emulated body (arm) and an emulated world (a cartoon blocks world, but nevertheless a world that could be manipulated) to text-oriented language processing algorithms. However, Winograd himself concluded that language understanding algorithms plus emulated bodies plus emulated worlds aren’t sufficient to achieve natural language understanding.
Every emulation necessarily makes simplifying assumptions about both the world and the body that are subject to errors, bugs, and munchkin effects. A physical robot body, on the other hand, is constrained by real-world physics to that which can be built. And the interaction of a physical body with a physical environment necessarily complies with that which can actually happen in the real world. You don’t have to know everything about the world in advance, as you would for a realistic world emulation. With a robot body in a physical environment, the world acts as its own model and constrains the universe of computation to a tractable size.
The other thing you get from a physical robot body is the implicit analog computation tools that come with it. A robot arm can be used as a ruler, for example. The torque on a motor can be used as a analog for effort. On these analog systems, world-grounded metaphors can be created using symbolic labels that point to (among other things) the arm-ruler or torque-effort systems. These metaphors can serve as the terminal point of a recursive meaning builder—and the physics of the world ensures that the results are good enough models of reality for communication to succeed or for thinking to be assessed for truth-with-a-small-t.
I certainly agree that a physical robot body is subject to constraints that an emulated body may not be subject to; it is possible to design an emulated body that we are unable to build, or even a body that cannot be built even in principle, or a body that interacts with its environment in ways that can’t happen in the real world.
And I similarly agree that physical systems demonstrate relationships, like that between torque and effort, which provide data, and that an emulated body doesn’t necessarily demonstrate the same relationships that a robot body does (or even that it can in principle). And those aren’t unrelated, of course; it’s precisely the constraints on the system that cause certain parts of that system to vary in correlated ways.
And I agree that a robot body is automatically subject to those constraints, whereas if I want to build an emulated software body that is subject to the same constraints that a particular robot body would be subject to, I need to know a lot more.
Of course, a robot body is not subject to the same constraints that a human body is subject to, any more than an emulated software body is; to the extent that a shared ability to understand language depends on a shared set of constraints, rather than on simply having some constraints, a robot can’t understand human language until it is physically equivalent to a human. (Similar reasoning tells us that paraplegics don’t understand language the same way as people with legs do.)
And if understanding one another’s language doesn’t depend on a shared set of constraints, such that a human with two legs, a human with no legs, and a not-perfectly-humanlike robot can all communicate with one another, it may turn out that an emulated software body can communicate with all three of them.
The latter seems more likely to me, but ultimately it’s an empirical question.
You make a very important point that I would like to emphasize: incommensurate bodies very likely will lead to misunderstanding. It’s not just a matter of shared or disjunct body isomorphism. It’s also a matter of embodied interaction in a real world.
Let’s take the very fundamental function of pointing. Every human language is rife with words called deictics that anchor the flow of utterance to specific pieces of the immediate environment. English examples are words like “this”, “that”, “near”, “far”, “soon”, “late”, the positional prepositions, pronominals like “me” and “you”—the meaning of these terms is grounded dynamically by the speakers and hearers in the time and place of utterance, the placement and salience of surrounding objects and structures, and the particular speaker and hearers and overhearers of the utterance. Human pointing—with the fingers, hands, eyes, chin, head tilt, elbow, whatever—has been shown to perform much the same functions as deictic speech in utterance. (See the work of Sotaro Kita if you’re interested in the data). A robot with no mechanism for pointing and no sensory apparatus for detecting the pointing gestures of human agents in its environment will misunderstand a great deal and will not be able to communicate fluently.
Then there are the cultural conventions that regulate pointing words and gestures alike. For example, spatial meanings tend to be either speaker-relative or landmark-relative or absolute (that is, embedded in a spatial frame of cardinal directions) in a given culture, and whichever of these options the culture chooses is used in both physical pointing and linguistic pointing through deictics. A robot with no cultural reference won’t be able to disambigurate “there” (relative to me here now) versus “there” (relative to the river/mountain/rising sun), even if physical pointing is integrated into the attempt to figure out what “there” is. And the problem may not be detected due to the illustion of double transparency.
This gets even more complicated when the world of discourse shifts from the immediate environment to other places, other times, or abstract ideas. People don’t stop inhabiting the real world when they talk about abstract ideas. And what you see in conversation videos is people mapping the world of discourse metaphorically to physical locations or objects in their immediate environment. The space behind me becomes yesterday’s events and the space beyond my reach in front of me becomes tomorrow’s plan. Or I alway point to the left when I’m talking about George and to the right when I’m talking about Fred.
This is all very much an empirical question, as you say. I guess my point is that the data has been accumulating for several decades now that embodiment matters a great deal. Where and how it matters is just beginning to be sorted out.
A robot with no mechanism for pointing and no sensory apparatus for detecting the pointing gestures of human agents in its environment will misunderstand a great deal and will not be able to communicate fluently.
If I am talking to you on the telephone, I have no mechanism for pointing and no sensory apparatus for detecting your pointing gestures, yet we can communicate just fine.
The whole embodied cognition thing is a massive, elementary mistake as bad as all the ones that Eliezer has analysed in the Sequences. It’s an instant fail.
The whole embodied cognition thing is a massive, elementary mistake as bad as all the ones that Eliezer has analysed in the Sequences. It’s an instant fail.
Can you expand on this just a bit? I am leaning, slowly, in the same direction, and I’d like a bit of a sanity check on this claim.
Firstly, I have no problem with the “embodied cognition” idea so far as it relates to human beings (or animals, for that matter). Yes, people think also with their bodies, store memories in the environment, point at things, and so on. This seems to me both true and unremarkable. So unremarkable as to hardly be worth the amount of thought that apparently goes into it. While it may be interesting to trace out all the ways in which it happens, I see no philosophical importance in the details.
Where it goes wrong is the application to AGI that says that because people do this, it is an essential part of how an intellgence of any sort must operate, and therefore a man-made intelligent machine must be given a body. The argument mistakes a superficial fact about observed intelligences for a fact about the mechanism whereby an intelligence of any sort must operate. There is a large and expanding body of work on making ever more elaborate robot puppets like the Nao, explicitly following a research programme of developing “embodied cognition”.
I cannot see these projects as being of any interest. I would be a lot more interested in seeing someone build a human-sized robot that can run unsupported on two legs (Boston Dynamics’ ATLAS is getting there), especially if it can run faster than a man while carrying a full military pack and isn’t tethered to a power cable (not yet done). However, nothing like that is a prerequisite to AGI. I do hold a personal opinion, which I’m not going to argue for here, that if someone developed a simple method of solving the control problems of an all-terrain running robot, they might get from that some insight into how to get farther, such as an all-terrain running robot that can hunt down humans trying to avoid it. Of course, the Unfriendly directions that might lead are obvious, as are the military motivations for building such machines, or inviting people to come up with designs. Of course, these powers will only be used for Good.
Since the embodied approach has been around in strength since the 1980s, and can be found in Turing in 1950, I think it fair to say that if it worked beyond the toy projects that AGI attempts always produce, we would have seen it by now.
The deaf communicate without sound, the blind without sight, and the limbless without pointing hands. On the internet people communicate without any of these. It doesn’t seem to hold anyone up, except in the mere matter of speed in the case of Stephen Hawking communicating by twitching cheek muscles.
Ah, no, the magic ingredient must be society! Cognition always takes place within society. Feral children are developmentally disabled for want of society. The evidence is clear: we must develop societies of AIs before they can be intelligent.
No, it’s language they must have! AGIs cognition must be based on a language. So if we design the perfect language, AGI will be a snap.
No, it’s upbringing they must have! So we’ll design a robot to be initially like a newborn baby and teach it through experience!
No, it’s....
No. The general form of all these arguments is broken.
Since the embodied approach has been around in strength since the 1980s, and can be found in Turing in 1950, I think it fair to say that if it worked beyond the toy projects that AGI attempts always produce, we would have seen it by now.
This is where you lose me. Isn’t that an equally effective argument against AGI in general?
Isn’t that an equally effective argument against AGI in general?
“AGI in general” is a thing of unlimited broadness, about which lack of success so far implies nothing more than lack of success so far. Cf. flying machines, which weren’t made until they were. Embodied cognition, on the other hand, is a definite thing, a specific approach that is at least 30 years old, and I don’t think it’s even made a contribution to narrow AI yet. It is only mentioned in Russell and Norvig in their concluding section on the philosophy of Strong AI, not in any of the practical chapters.
I took RichardKennaway’s post to mean something like the following:
“Birds fly by flapping their wings, but that’s not the only way to fly; we have built airplanes, dirigibles and rockets that fly differently. Humans acquire intelligence (and language) by interacting with their physical environment using a specific set of sensors and effectors, but that’s not the only way to acquire intelligence. Tomorrow, we may build an AI that does so differently.”
But since that idea has been around in strength since the 1980s, and can be found in Turing in 1950, apparently it’s fair to say that if it worked beyond the toy projects that AGI attempts always produce, we would have seen it by now.
I think that we have seen it by now, we just don’t call it “AI”. Even in Turing’s day, we had radar systems that could automatically lock on to enemy planes and shoot them down. Today, we have search engines that can provide answers (with a significant degree of success) to textual or verbal queries; mapping software that can plot the best path through a network of roadways; chess programs that can consistently defeat humans; cars that drive themselves; planes that fly themselves; plus a host of other things like that. Sure, none of these projects are Strong AI, but neither are they toys.
This depends on the definition of ‘toy projects’ that you use. For the sort of broad definition you are using, where ‘toy projects’ refers literally to toys, Richard Kennaway’s original claim that the embodied approach had only produced toys is factually incorrect. For the definition of ‘toy projects’ that both Richard Kennaway and Document are using, in which ‘toy projects’ is more closely related to ‘toy models’- i.e.attempts at a simplified version of Strong AI- this is an argument against AGI in general.
I see what you mean, but I’m having trouble understanding what “a simplified version of Strong AI” would look like.
For example, can we consider a natural language processing system that’s connected to a modern search engine to be “a simplified version of Strong AI” ? Such a system is obviously not generally intelligent, but it does perform several important functions—such as natural language processing—that would pretty much be a requirement for any AGI. However, the implementation of such a system is most likely not generalizable to an AGI (if it were, we’d have AGI by now). So, can we consider it to be a “toy project”, or not ?
The “magic ingredient” may be a bridging of intuitions: an embodied AI which you can more naturally interact with offers more intuitive metrics for progress; milestones which can be used to attract funding since they make more sense intuitively.
Obviously you can build an AGI using only lego stones. And you can build an AGI “purely” as software (i.e. with variable hardware substrates). The steelman for pursuing embodied cognition would not be “embodiment is strictly necessary to build AGIs” (boring!), but that “given humans with a goal of building an AGI, going the embodiment route may be a viable approach”.
I well remember that early morning in the CS lab, the better part of a decade ago, when I stumbled—still half asleep—into a sideroom to turn on the lights, only to stare into the eye of Eccerobot (in an earlier incarnation), which was visiting our lab. Shudder.
I used to joke that my goal in life would be to build the successor creature, and to be judged by it (humankind and me both). To be judged and to be found unworthy in its (in this case single) eye, and to be smitten. After all, what better emotional proof to have created something of worth is there than your creation judging you to be unworthy? Take my atoms, Adambot!
I don’t know, but I doubt that the communication medium makes much difference beyond the individual skills of the people using it. People can use multiple modalities to communicate, and in a situation where some are missing, one varies one’s use of the others to accomplish the goal.
In adversarial negotiations one might even find it an advantage not to be seen, to avoid accidentally revealing things one wishes to keep secret. Of course, that applies to both parties, and it will come down to a matter of who is more skilled at using the means available.
Sure, I agree that we make use of all kinds of contextual cues to interpret speech, and a system lacking awareness of that context will have trouble interpreting speech.For example, if I say “Do you like that?” to Sam, when Sam can’t see the thing I’m gesturing to indicate or doesn’t share the cultural context that lets them interpret that gesture, Sam won’t be able to interpret or engage with me successfully. Absolutely agreed. And this applies to all kinds of things, including (as you say) but hardly limited to pointing.
And, sure, the system may not even be aware of that trouble… illusions of transparency abound. Sam might go along secure in the belief that they know what I’m asking about and be completely wrong. Absolutely agreed.
And sure, I agree that we rely heavily on physical metaphors when discussing abstract ideas, and that a system incapable of processing my metaphors will have difficulty engaging with me successfully. Absolutely agreed.
All of that said, what I have trouble with is your apparent insistence that only a humanoid system is capable of perceiving or interpreting human contextual cues, metaphors, etc. That doesn’t seem likely to me at all, any more than it seems likely that a blind person (or one on the other end of a text-only link) is incapable of understanding human speech.
Let’s take the very fundamental function of pointing. Every human language is rife with words called deictics that anchor the flow of utterance to specific pieces of the immediate environment. English examples are words like “this”, “that”, “near”, “far”, “soon”, “late”, the positional prepositions, pronominals like “me” and “you”—the meaning of these terms is grounded dynamically by the speakers and hearers in the time and place of utterance, the placement and salience of surrounding objects and structures, and the particular speaker and hearers and overhearers of the utterance. Human pointing—with the fingers, hands, eyes, chin, head tilt, elbow, whatever—has been shown to perform much the same functions as deictic speech in utterance. (See the work of Sotaro Kita if you’re interested in the data). A robot with no mechanism for pointing and no sensory apparatus for detecting the pointing gestures of human agents in its environment will misunderstand a great deal and will not be able to communicate fluently.
Are you really claiming that ability to understand the very concept of indexicality, and concepts like “soon”, “late”, “far”, etc., relies on humanlike fingers? That seems like an extraordinary claim, to put it lightly.
Also:
A robot with no mechanism for pointing and no sensory apparatus for detecting the pointing gestures of human agents in its environment will misunderstand a great deal and will not be able to communicate fluently.
“Detecting pointing gestures” would be the function of a perception algorithm, not a sensory apparatus (unless what you mean is “a robot with no ability to perceive positions/orientations/etc. of objects in its environment”, which… wouldn’t be very useful). So it’s a matter of what we do with sense data, not what sorts of body we have; that is, software, not hardware.
More generally, a lot of what you’re saying (and — this is my very tentative impression — a lot of the ideas of embodied cognition in general) seems to be based on an idea that we might create some general-intelligent AI or robot, but have it start at some “undeveloped” state and then proceed to “learn” or “evolve”, gathering concepts about the world, growing in understanding, until it achieves some desired level of intellectual development. The concern then arises that without the kind of embodiment that we humans enjoy, this AI will not develop the concepts necessary for it to understand us and vice versa.
Ok. But is anyone working in AI these days actually suggesting that this is how we should go about doing things? Is everyone working in AI these days suggesting that? Isn’t this entire line of reasoning inapplicable to whole broad swaths of possible approaches to AI design?
P.S. What does “there, relative to the river” mean?
Are you really claiming that ability to understand the very concept of indexicality, and concepts like “soon”, “late”, “far”, etc., relies on humanlike fingers? That seems like an extraordinary claim, to put it lightly.
Yeah, I am advancing the hypothesis that, in humans, the comprehension of indexicality relies on embodied pointing at its core—though not just with fingers, which are not universally used for pointing in all human cultures. Sotaro Kita has the most data on this subject for language, but the embodied basis of mathematics is discussed in Where Mathematics Comes From, by by Geroge Lakoff and Rafael Nunez . Whether all possible minds must rely on such a mechanism, I couldn’t possibly guess. But I am persuaded humans do (a lot of) it with their bodies.
What does “there, relative to the river” mean?
In most European cultures, we use speaker-relative deictics. If I point to the southeast while facing south and say “there”, I mean “generally to my front and left”. But if I turn around and face north, I will point to the northwest and say “there” to mean the same thing, ie, “generally to my front and left.” The fact that the physical direction of my pointing gesture is different is irrelevant in English; it’s my body position that’s used as a landmark for finding the target of “there”. (Unless I’m pointing at something in particular here and now, of course; in which case the target of the pointing action becomes its own landmark.)
In a number of Native American languages, the pointing is always to a cardinal direction. If the orientation of my body changes when I say “there”, I might point over my shoulder rather than to my front and left. The landmark for finding the target of “there” is a direction relative to the trajetory of the sun.
But many cultures use a dominant feature of the landscape, like the Amazon or the Missippi or the Nile rivers, or a major mountain range like the Rockies, or a sacred city like Mecca, as the orientation landmark, and in some cultures this gets encoded in the deictics of the language and the conventions for pointing. “Up” might not mean up vertically, but rather “upriver”, while “down” would be “downriver”. In a steep river valley in New Guinea, “down” could mean “toward the river” and “up” could mean “away from the river”. And “here” could mean “at the river” while “there” could mean “not at the river”.
The cultural variability and place-specificity of language was not widely known to Western linguists until about ten years ago. For a long time, it was assumed that person-relative orientation was a biological constraint on meaning. This turns out to be not quite accurate. So I guess I should be more nuanced in the way I present the notion of embodied cognition. How’s this: “Embodied action in the world with a cultural twist on top” is the grounding point at the bottom of the symbol expansion for human meanings, linguistic and otherwise.
If the orientation of my body changes when I say “there”, I might point over my shoulder rather than to my front and left.
I was able to follow this explanation (as well as the rest of your post) without seeing your physical body in any way. In addition, I suspect that, while you were typing your paragraph, you weren’t physically pointing at things. The fact that we can do this looks to me like evidence against your main thesis.
I was able to follow this explanation (as well as the rest of your post) without seeing your physical body in any way. … The fact that we can do this looks to me like evidence against your main thesis.
Ah, but you’re assuming that this particular interaction stands on its own. I’ll bet you were able to visualize the described gestures just fine by invoking memories of past interactions with bodies in the world.
Two points. First, I don’t contest the existence of verbal labels that merely refer—or even just register as being invoked without refering at all. As long as some labels are directly grounded to body/world, or refer to other labels that do get grounded in the body/world historically, we generally get by in routine situations. And all cultures have error detection and repair norms for conversation so that we can usually recover without social disaster.
However, the fact that verbal labels can be used without grounding them in the body/world is a problem. It is frequently the case that speakers and hearers alike don’t bother to connect words to reality, and this is a major source of misunderstanding, error, and nonsense. In our own case here and now, we are actually failing to understand each other fully because I can’t show you actual videotapes of what I’m talking about. You are rightly skeptical because words alone aren’t good enough evidence. And that is itself evidence.
Second, humans have a developmental trajectory and history, and memories of that history. We’re a time-binding animal in Korzybski’s terminology. I would suggest that an enculturated adult native speaker of a language will have what amount to “muscle memory” tics that can be invoked as needed to create referents. Mere memory of a motion or a perception is probably sufficient.
“Oh, look, it’s an invisible gesture!” is not at all convincing, I realize, so let me summarize several lines of evidence for it.
Developmentally, there’s quite a lot of research on language acquisition in infants and young children that suggests shared attention management—through indexical pointing, and shared gaze, and physical coercion of the body, and noises that trigger attention shift—is a critical building block for constructing “aboutness” in human language. We also start out with some shared, built-in cries and facial expressions linked to emotional states. At this level of development, communication largely fails unless there is a lot of embodied scaffolding for the interaction, much of it provided by the caregiver but a large part of it provided by the physical context of the interaction. There is also some evidence from the gestural communication of apes that attests to the importance of embodied attention management in communication.
Also, co-speech gesture turns out to be a human universal. Congenitally blind children do it, having never seen gesture by anyone else. Congenitally deaf children who spend time in groups together will invent entire gestural languages complete with formal syntax, as recently happened in Nicaragua. And adults speaking on the telephone will gesture even knowing they cannot be seen. Granted, people gesture in private at a significantly lower rate than they do face-to-face, but the fact that they do it at all is a bit of a puzzle, since the gestures can’t be serving a communicative function in these contexts. Does the gesturing help the speakers actually think, or at least make meaning more clear to themselves? Susan Goldin-Meadow and her colleagues think so.
We also know from video conversation data that adults spontaneously invent new gestures all the time in conversation, then reuse them. Interestingly, though, each reuse becomes more attentuated, simplified, and stylized with repetition. Similar effects are seen in the development of sign languages and in written scripts.
But just how embodied can a label be when gesture (and other embodied experience) is just a memory, and is so internalized that is is externally invisible? This has actually been tested experimentally. The Stroop effect has been known for decades, for example: when the word “red” is presented in blue text, it is read or acted on more slowly than when the word “red” is presented in red text—or in socially neutral black text. That’s on the embodied perception side of things. But more recent psychophysical experiments have demonstrated a similar psychomotor Stroop-like effect when spatial and motion stimulus sentences are semantically congruent with the direction of the required response action. This effect holds even for metaphorical words like “give”, which tests as motor-congruent with motion away from oneself, and “take”, which tests as motor-congruent with motion toward oneself.
I understand how counterintuitive this stuff can be when you first encounter it—especially to intelligent folks who work with codes or words or models a great deal. I expect the two of us will never reach a consensus on this without looking at a lot of original data—and who has the time to analyze all the data that exists on all the interesting problems in the world? I’d be pleased if you could just note for future reference that a body of empirical evidence exists for the claim. That’s all.
In our own case here and now, we are actually failing to understand each other fully because I can’t show you actual videotapes of what I’m talking about.
What do you mean by “fully” ? I believe I understand you well enough for all practical purposes. I don’t agree with you, but agreement and understanding are two different things.
First, I don’t contest the existence of verbal labels that merely refer—or even just register as being invoked without refering at all.
I’m not sure what you mean by “merely refer”, but keep in mind that we humans are able to communicate concepts which have no physical analogues that would be immediately accessible to our senses. For example, we can talk about things like “O(N)”, or “ribosome”, or “a^n +b^n = c^n”. We can also talk about entirely imaginary worlds, such as f.ex. the world where Mario, the turtle-crushing plumber, lives. And we can do this without having any “physical context” for the interaction, too.
All that is beside the point, however. In the rest of your post, you bring up a lot of evidence in support of your model of human development. That’s great, but your original claim was that any type of intelligence at all will require a physical body in order to develop; and nothing you’ve said so far is relevant to this claim. True, human intelligence is the only kind we know of so far, but then, at one point birds and insects were the only self-propelled flyers in existence—and that’s not the case anymore.
Furthermore, your also claimed that no simulation, no matter how realistic, will serve to replace the physical world for the purposes of human development, and I’m still not convinced that this is true, either. As I’d said before, we humans do not have perfect senses; if physical coordinates of real objects were snapped to a 0.01mm grid, no human child would ever notice. And in fact, there are plenty of humans who grow up and develop language just fine without the ability to see colors, or to move some of their limbs in order to point at things.
Just to drive the point home: even if I granted all of your arguments regarding humans, you would still need to demonstrate that human intelligence is the only possible kind of intelligence; that growing up in a human body is the only possible way to develop human intelligence; and that no simulation could in principle suffice, and the body must be physical. These are all very strong claims, and so far you have provided no evidence for any of them.
Let me refer you to Computation and Human Experience, by Philip E. Agre, and to Understanding Computers and Cognition, by Terry Winograd and Fernando Flores.
Yeah, I am advancing the hypothesis that, in humans, the comprehension of indexicality relies on embodied pointing at its core [...] Whether all possible minds must rely on such a mechanism, I couldn’t possibly guess. But I am persuaded humans do (a lot of) it with their bodies.
But wait; whether all possible minds must rely on such a mechanism is the entire question at hand! Humans implement this feature in some particular way? Fine; but this thread started by discussing what AIs and robots must do to implement the same feature. If implementation-specific details in humans don’t tell us anything interesting about implementation constraints in other minds, especially artificial minds which we are in theory free to place anywhere in mind design space, then the entire topic is almost completely irrelevant to an AI discussion (except possible as an example of “well, here is one way you could do it”).
In most European cultures, we use speaker-relative deictics. If I point to the southeast while facing south and say “there”, I mean “generally to my front and left”. But if I turn around and face north, I will point to the northwest and say “there” to mean the same thing, ie, “generally to my front and left.”
Er, what? I thought I was a member of a European culture, but I don’t think this is how I use the word “there”. If I point to some direction while facing somewhere, and say “there”, I mean… “in the direction I am pointing”.
The only situation when I’d use “there” in the way you describe is if I were describing some scenario involving myself located somewhere other than my current location, such that absolute directions in the story/scenario would not be the same as absolute directions in my current location.
In a steep river valley in New Guinea, “down” could mean “toward the river” and “up” could mean “away from the river”. And “here” could mean “at the river” while “there” could mean “not at the river”.
If this is accurate, then why on earth would we map this word in this language to the English “there”? It clearly does not remotely resemble how we use the word “there”, so this seems to be a case of poor translation rather than an example of cultural differences.
In a number of Native American languages, the pointing is always to a cardinal direction. [...] The cultural variability and place-specificity of language was not widely known to Western linguists until about ten years ago. For a long time, it was assumed that person-relative orientation was a biological constraint on meaning.
Yeah, actually, this research I was aware of. As I recall, the Native Americans in question had some difficulty understanding the Westerners’ concepts of speaker-relative indexicals. But note: if we can have such different concepts of indexicality, despite sharing the same pointing digits and whatnot… it seems premature, at best, to suggest that said hardware plays such a key role in our concept formation, much less in the possibility of having such concepts at all.
How’s this: “Embodied action in the world with a cultural twist on top” is the grounding point at the bottom of the symbol expansion for human meanings, linguistic and otherwise.
Ultimately, the interesting aspect of this entire discussion (imo, of course) is what these human-specific implementation details can tell us about other parts of mind design space. I remain skeptical that the answer is anything other than “not much”. (Incidentally, if you know of papers/books that address this aspect specifically, I would be interested.)
However, Winograd himself concluded that language understanding algorithms plus emulated bodies plus emulated worlds aren’t sufficient to achieve natural language understanding.
Ok, but is this the correct conclusion ? It’s pretty obvious that a SHRDLU-style simulation is not sufficient to achieve natural language understanding, but can you generalize that to saying that no conceivable simulation is sufficient ? As far as I can tell, you would make such a generalization because,
Every emulation necessarily makes simplifying assumptions about both the world and the body that are subject to errors, bugs, and munchkin effects.
While this is true, it is also true that our human senses cannot fully perceive the reality around us with infinite fidelity. A child who is still learning his native tongue can’t a rock that is 5cm in diameter from a rock that’s 5.000001cm in diameter. This would lead me to believe that your simulation does not need 7 significant figures of precision in order to produce a language-speaking mind.
In fact, a colorblind child can’t tell a red-colored ball from a green-colored ball, and yet colorblind adults can speak a variety of languages, so it’s possible that your simulation could be monochrome and still achieve the desired result.
I completely agree that Searle believes in magic (aka “intentionality”), which is not useful. But this does not mean the Chinese Room problem isn’t real.
When you study human language use empirically in natural contexts (through frame-by-frame analysis of video recordings), it turns out that what we think we do with language and what we actually do are rather divergent. The body and places in the world and other agents in the interaction all play a much bigger role in the real-time construction of meaning than you would expect from introspection.
This sounds interesting. Could you expand on this?
Without embodiment to ground meaning, you get into problems of unsearchable infinite regress, and you can easily hypothesize internally consistent worlds that are nevertheless not the real world the body lives in. This can lead to religions and other serious delusions.
Yeah. This, and the “existential angst” thing, seem to be common problems on LW, and I’ve never been sure why. I think that keeping yourself busy doing practical stuff prevents it from becoming an issue.
When you study human language use empirically in natural contexts (through frame-by-frame analysis of video recordings), it turns out that what we think we do with language and what we actually do are rather divergent. The body and places in the world and other agents in the interaction all play a much bigger role in the real-time construction of meaning than you would expect from introspection.
That’s fascinating! What research has been done on this! I would totally be interested in reading more about it.
Jurgen Streeck’s book Gesturecraft: The manu-facture of meaning is a good summary of Streeck’s cross-linguistic research on the interaction of gesture and speech in meaning creation. The book is pre-theoretical, for the most part, but Streeck does make an important claim that the biological covariation in a speaker or hearer across the somatosensory modes of gesture, vision, audition, and speech do the work of abstraction—which is an unsolved problem in my book.
Streeck’s claim happens to converge with Eric Kandel’s hypothesis that abstraction happens when neurological activity covaries across different somatosensory modes. After all, the only things that CAN covary across, say, musical tone changes in the ear and dance moves in the arms, legs, trunk, and head, are abstract relations. Temporal synchronicity and sequence, say.
Another interesting book is Cognition in the Wild by Edwin Hutchins. Hutchins goes rather too far in the direction of externalizing cognition from the participants in the act of knowing, but he does make it clear that cultures build tools into the environment that offload thinking function and effort, to the general benefit of all concerned. Those tools get included by their users in the manufacture of online meaning, to the point that the online meaning can’t be reconstructed from the words alone.
The whole field of conversation analysis goes into the micro-organization of interactive utterances from a linguistic point of view rather than a cognitive perspective. The focus is on the social and communicative functions of empirically attested language structures as demonstrated by the speakers themselves to one another. Anything written by John Heritage in that vein is worth reading, IMO.
EDIT: Revised, consolidated, and expanded bibliography on interactive construction of meaning:
LINGUISTICS
Philosophy in the Flesh, by George Lakoff and Mark Johnson
Women, Fire and Dangerous Things, by George Lakoff
The Singing Neaderthals, by Steven Mithen
CONVERSATION ANALYSIS & GESTURE RESEARCH
Handbook of Conversation Analysis, by Jack Sidnell & Tanya Stivers
Gesturecraft: The Manu-facture of Meaning, by Jurgen Streeck
Pointing: Where Language, Culture, and Cognition Meet, by Sotaro Kita
Gesture: Visible Action as Utterance, by Adam Kendon
Hearing Gesture: How Our Hands Help Us Think, by Susan Goldin-Meadow
Hand and Mind: What Gestures Reveal about Thought, by David McNeill
COGNITIVE PSYCHOLOGY
Symbols and Embodiment, edited by Manuel de Vega, Arthur M Glenberg, & Arthur C Graesser
Hi, everyone. My name is Teresa, and I came to Less Wrong by way of HPMOR.
I read the first dozen chapters of HPMOR without having read or seen the Harry Potter canon, but once I was hooked on the former, it became necessary to see all the movies and then read all the books in order to get the HPMOR jokes. JK Rowling actually earned royalties she would never have received otherwise thanks to HPMOR.
I don’t actually identify as a pure rationalist, although I started out that way many, many years ago. What I am committed to today is SANITY. I learned the hard way that, in my case at least, it is the body that keeps the mind sane. Without embodiment to ground meaning, you get into problems of unsearchable infinite regress, and you can easily hypothesize internally consistent worlds that are nevertheless not the real world the body lives in. This can lead to religions and other serious delusions.
That said, however, I find a lot of utility in thinking through the material on this site. I discovered Bayesian decision theory in high school, but the texts I read at the time either didn’t explain the whole theory or else I didn’t catch it all at age 14. Either way, it was just a cute trick for calculating compound utility scores based on guesses of likelihood for various contingencies. The greatest service the Less Wrong site has done for me is to connect the utility calculation method to EMPIRICAL prior probabilities! Like, duh! A hugely useful tool, that is.
As a professional writer in my day job and student of applied linguistics research otherwise, I have some reservations about those of the Sequences that reference the philosophy of language. I completely agree that Searle believes in magic (aka “intentionality”), which is not useful. But this does not mean the Chinese Room problem isn’t real.
When you study human language use empirically in natural contexts (through frame-by-frame analysis of video recordings), it turns out that what we think we do with language and what we actually do are rather divergent. The body and places in the world and other agents in the interaction all play a much bigger role in the real-time construction of meaning than you would expect from introspection. Egocentric bias has a HUGE impact on what we imagine about our own utterances. I’ve come to the conclusion that Stevan Harnad is absolutely correct, and that machine language understanding will require an AI ROBOT, not a disembodied algorithmic system.
As for HPMOR, I hereby predict that Harrymort is going to go back in time to the primal event in Godric’s Hollow and change the entire universe to canon in his quest to, er, spoilers, can’t say.
Cheers.
The chief deficiency of embodiment philosophy-of-mind, at least among AIers and cognitivists, is that they constantly say “embodiment” when they should say “experience of embodiment”. And when you put it that way, most of the magic leaches away and you’re left facing the same old hard problem of consciousness. Meaning, understanding, intentionality are all aspects of consciousness. And various studies can show that body awareness is surprisingly important in the genesis and constitution of those things. But just having a material object governed by a hierarchy of feedback loops does not explain why there should be anyone home in that object—why there should be any form of awareness in, or around, or otherwise associated with that object.
I sort of agree with you: if the “hard problem of consciousness” is indeed a coherent problem that needs to be solved, then what you say makes perfect sense. But I am not convinced that it’s a problem worth solving. I don’t care whether Mitchell_Porter is an entity that really, truly experiences consciousness, or whether it’s only a “material object governed by a hierarchy of feedback loops”, so long as Mitchell_Porter has interesting things to say, and can hold up his/her/its own end of the conversation.
Is there any reason why I should care ?
Let’s distinguish between superficial and fundamental ignorance. If you flip a coin, you may not know which way it came up until you look. This typifies what I will call superficial ignorance. The mechanics of a flat disk of metal, sent spinning in a certain way, is not an especially mysterious subject. Your ignorance of whether the coin shows head or tails does not imply ignorance of the essence of what just happened.
Fundamental ignorance is where you really don’t know what’s going on. The sun goes up and down in the sky and you don’t know why, for a third of each day you’re in some other reality where you don’t remember the usual one, and so on. The situation with respect to consciousness is in this category.
It could be argued that you should care about any instance of fundamental ignorance, because its implications are unknown in a way that the implications of superficial ignorance are not. Who knows what further wonderful, terrible, or important facts it obscures? Then again, it could be argued that there’s fundamental ignorance beneath every instance of superficial ignorance. Consider the spinning coin: we have a physical mechanics that can describe its motion: but why does that mechanics work?
Conversely, in the case of consciousness, there’s an argument for complacency: I may not understand why brains are conscious, but human beings pretty consistently act in the ways that I tentatively regard as indicative of consciousness, and (I could say) in my dealing with them, it’s how they behave which matters.
There are a few further reasons why someone may end up caring whether other people/beings are truly conscious or not. One is morality. I may consider it important to know (if only I could know), whether they really are happy or suffering, or whether they are just automata pantomiming the behaviors of happiness and suffering. Another is intellectual curiosity. Perhaps you just decide that you want to know, not because of the argument from the unknown significance of fundamental ignorance, but on a whim, or because of the cool satisfaction of grasping something abstract.
But perhaps the number-one reason that someone from this community should want to know, is that many people here anticipate that they personally will undergo transformations such as mind uploading. If you at least value your own consciousness, and not just your behaviors, then you have an interest in understanding whether a given transformation preserves consciousness or not.
I think that you are unintentionally conflating two very different questions:
1). What is the mechanism that causes us to perceive certain entities, including humans, as possessing consciousness ?
2). Let’s assume that there’s a hidden factor, called “consciousness”, that is sufficient but not necessary to cause us to perceive humans as being conscious. How can we test for the presence or absence of this factor ?
Answering (2) may help you answer (1), but (2) is unanswerable if the assumption you are making in it is wrong.
I personally see no reason to postulate the presence of some hidden, undetectable factor that causes humans to be conscious. I would love to know how is it exactly that human brains produce the phenomenon we perceive as “consciousness”, but I’m not convinced that such a feature could only have a single possible implementation.
This is indeed important with respect to morality:
If the presence of consciousness is unfalsifiable, then you can’t know, and you’re obligated to treat all entities that appear to be happy or suffering equally (for the purposes of making your moral decisions, that is). On the other hand, if the presence of consciousness is falsifiable, then tell me how I can falsify it. If you hand-wave the answer by saying, “oh, it’s a hard problem”, then you don’t have a useful model, you’ve got something akin to Vitalism. It’d be like saying,
“Some suns are powered by fusion, and others are powered by undetectable sun-goblins that make it look like the sun is powered by fusion. Our own sun is powered by goblins. You can’t ever detect them, but trust me, they’re there”.
Would it be appropriate to say that superficial ignorance is factual (one does not know the particular inputs to the equations which govern the coin’s movement) where fundamental ignorance is conceptual (one does not have a concept that the coin is governed by equations of motion)?
I don’t know.
You defect in the Prisoner’s Dilemma against a rock with “defect” written on it, defect in the PD against a rock with “cooperate” written on it, and cooperate in the PD against a copy of yourself. So, if you’re ever playing PD against Mitchell_Porter, you want to know whether he’s more like a rock or like yourself.
Right, but in order to figure out whether to cooperate with or defect against Mitchell_Porter, all I need to know is what strategy he is most likely to pursue. I don’t need to know whether he’s a “material object governed by a hierarchy of feedback loops” or a biological human possessed of “consciousness” or an animatronic garden gnome; I just need to know enough to find out which button he’ll press.
I am not familiar with Stevan Harnad, but this sounds counterintuitive to me (though it’s very likely that I’m misunderstanding your point). I am currently reading your words on the screen. I can’t hear you or see your body language. And yet, I can still understand what you wrote (not fully, perhaps, but enough to ask you questions about it). In our current situation, I’m not too different from a software program that is receiving the text via some input stream, so I don’t see an a priori reason why such a program could not understand the text as well as I do.
I assume telms is referring to embodied cognition, the idea that your ability to communicate with her, and achieve mutual understanding of any sort, is made possible by shared concepts and mental structures which can only arise in an “embodied” mind.
I am rather skeptical about this thesis as far as artificial minds go; somewhat less skeptical about it if applied only to “natural” (i.e., evolved) minds — although in that case it’s almost trivial; but in any case don’t know enough about it to have a fully informed opinion.
Oh, ok, that makes more sense. As far as I understand, the idea behind embodied cognition is that intelligent minds must have a physical body with a rich set of sensors and effectors in order to develop; but once they’re done with their development, they can read text off of the screen instead of talking.
That definitely makes sense in case of us biological humans, but just like you, I’m skeptical that the thesis applies to all possible minds at all times.
Some representative papers of Stevan Harnad are:
The symbol grounding problem
Other bodies, other minds: A machine incarnation of an old philosophical problem
I skimmed both papers, and found them unconvincing. Granted, I am not a philosopher, so it’s likely that I’m missing something, but still:
In the first paper, Harnad argues that rule-based expert systems cannot be used to build a Strong AI; I completely agree. He further argues that merely building a system out of neural networks does not guarantee that it will grow to be a Strong AI either; again, we’re on the same page so far. He further points out that, currently, nothing even resembling Strong AI exists anywhere. No argument there.
Harnad totally loses me, however, when he begins talking about “meaning” as though that were some separate entity to which “symbols” are attached. He keeps contrasting mere “symbol manipulation” with true understanding of “meaning”, but he never explains how we could tell one from the other.
In the second paper, Harnad basically falls into the same trap as Searle. He lampoons the “System Reply” by calling it things like “a predictable piece of hand-waving”—but that’s just name-calling, not an argument. Why precisely is Harnad (or Searle) so convinced that the Chinese Room as a whole does not understand Chinese ? Sure, the man inside doesn’t understand Chinese, but that’s like saying that a car cannot drive uphill at 70 mph because no human driver can run uphill that fast.
The rest of his paper amounts to a moving of the goalposts. Harnad is basically saying, “Ok, let’s say we have an AI that can pass the TT via teletype. But that’s not enough ! It also needs to pass the TTT ! And if it passes that, then the TTTT ! And then maybe the TTTTT !” Meanwhile, Harnad himself is reading articles off his screen which were published by other philosophers, and somehow he never requires them to pass the TTTT before he takes their writings seriously.
Don’t get me wrong, it is entirely possible that the only way to develop a Strong AI is to embody it in the physical world, and that no simulation, no matter how realistic, will suffice. I am open to being convinced, but the papers you linked are not convincing. I’m not interested in figuring out whether any given person who appears to speak English really, truly understands English; or whether this person is merely mimicking a perfect understanding of English. I’d rather listen to what such a person has to say.
Haven’t read the Harnad paper yet, but the reason Searle’s convinced seems obvious to me: he just doesn’t take his own scenario seriously — seriously enough to really imagine it, rather than just treating it as a piece of absurd fantasy. In other words, he does what Dennett calls “mistaking a failure of imagination for an insight into necessity”.
In The Mind’s Eye, Dennett and Hofstadter give the Chinese Room scenario a much more serious fictional treatment, and show in great detail what elements of it trigger Searle’s intuitions on the matter, as well as how to tweak those intuitions in various ways. Sadly but predictably, Searle has never (to my knowledge) responded to their dissection of his views.
I like the expression and can think of times where I have looked for something that expresses this all-to-common practice simply.
Having now read the second linked Harnad paper, my evaluation is similar to yours. Some more specific comments follow.
Harnad talks a lot about whether a body “has a mind”: whether a Turing Test could show if a body “has a mind”, how we know a body “has a mind”, etc.
What on earth does he mean by “mind”? Not… the same thing that most of us here at LessWrong mean by it, I should think.
He also refers to artificial intelligence as “computer models”. Either he is using “model” quite strangely as well… or he has some… very confused ideas about AI. (Actually, very confused ideas about computers in general is, in my experience, endemic among the philosopher population. It’s really rather distressing.)
This has surely got to be one of the most ludicrous pronouncements I’ve ever seen a philosopher make.
One of these things is not like the others...
Well, maybe our chess-playing module is not autonomous, but as we have seen, we can certainly build a chess-playing module that has absolutely no capacity to see, move, manipulate, or speak.
Most of the rest of the paper is nonsensical, groundless handwaving, in the vein of Searle but worse. I am unimpressed.
Yeah, I think that’s the main problem with pretty much the entire Searle camp. As far as I can tell, if they do mean anything by the word “mind”, then it’s “you know, that thing that makes us different from machines”. So, we are different from AIs because we are different from AIs. It’s obvious when you put it that way !
Well, I certainly agree that there are important aspects of human languages that come out of our experience of being embodied in particular ways, and that without some sort of model that embeds the results of that kind of experience we’re not going to get very far in automating the understanding of human language.
But it sounds like you’re suggesting that it’s not possible to construct such a model within a “disembodied” algorithmic system, and I’m not sure why that should be true.
Then again, I’m not really sure what precisely is meant here by “disembodied algorithmic system” or “ROBOT”.
For example, is a computer executing a software emulation of a humanoid body interacting with an emulated physical environment a disembodied algorithmic system, or an AI ROBOT (or neither, or both, or it depends on something)? How would I tell, for a given computer, which kind of thing it was (if either)?
An emulated body in an emulated environment is a disembodied algorithmic system in my terminology. The classic example is Terry Winograd’s SHRDLU, which made significant advances in machine language understanding by adding an emulated body (arm) and an emulated world (a cartoon blocks world, but nevertheless a world that could be manipulated) to text-oriented language processing algorithms. However, Winograd himself concluded that language understanding algorithms plus emulated bodies plus emulated worlds aren’t sufficient to achieve natural language understanding.
Every emulation necessarily makes simplifying assumptions about both the world and the body that are subject to errors, bugs, and munchkin effects. A physical robot body, on the other hand, is constrained by real-world physics to that which can be built. And the interaction of a physical body with a physical environment necessarily complies with that which can actually happen in the real world. You don’t have to know everything about the world in advance, as you would for a realistic world emulation. With a robot body in a physical environment, the world acts as its own model and constrains the universe of computation to a tractable size.
The other thing you get from a physical robot body is the implicit analog computation tools that come with it. A robot arm can be used as a ruler, for example. The torque on a motor can be used as a analog for effort. On these analog systems, world-grounded metaphors can be created using symbolic labels that point to (among other things) the arm-ruler or torque-effort systems. These metaphors can serve as the terminal point of a recursive meaning builder—and the physics of the world ensures that the results are good enough models of reality for communication to succeed or for thinking to be assessed for truth-with-a-small-t.
OK, thanks for clarifying.
I certainly agree that a physical robot body is subject to constraints that an emulated body may not be subject to; it is possible to design an emulated body that we are unable to build, or even a body that cannot be built even in principle, or a body that interacts with its environment in ways that can’t happen in the real world.
And I similarly agree that physical systems demonstrate relationships, like that between torque and effort, which provide data, and that an emulated body doesn’t necessarily demonstrate the same relationships that a robot body does (or even that it can in principle). And those aren’t unrelated, of course; it’s precisely the constraints on the system that cause certain parts of that system to vary in correlated ways.
And I agree that a robot body is automatically subject to those constraints, whereas if I want to build an emulated software body that is subject to the same constraints that a particular robot body would be subject to, I need to know a lot more.
Of course, a robot body is not subject to the same constraints that a human body is subject to, any more than an emulated software body is; to the extent that a shared ability to understand language depends on a shared set of constraints, rather than on simply having some constraints, a robot can’t understand human language until it is physically equivalent to a human. (Similar reasoning tells us that paraplegics don’t understand language the same way as people with legs do.)
And if understanding one another’s language doesn’t depend on a shared set of constraints, such that a human with two legs, a human with no legs, and a not-perfectly-humanlike robot can all communicate with one another, it may turn out that an emulated software body can communicate with all three of them.
The latter seems more likely to me, but ultimately it’s an empirical question.
You make a very important point that I would like to emphasize: incommensurate bodies very likely will lead to misunderstanding. It’s not just a matter of shared or disjunct body isomorphism. It’s also a matter of embodied interaction in a real world.
Let’s take the very fundamental function of pointing. Every human language is rife with words called deictics that anchor the flow of utterance to specific pieces of the immediate environment. English examples are words like “this”, “that”, “near”, “far”, “soon”, “late”, the positional prepositions, pronominals like “me” and “you”—the meaning of these terms is grounded dynamically by the speakers and hearers in the time and place of utterance, the placement and salience of surrounding objects and structures, and the particular speaker and hearers and overhearers of the utterance. Human pointing—with the fingers, hands, eyes, chin, head tilt, elbow, whatever—has been shown to perform much the same functions as deictic speech in utterance. (See the work of Sotaro Kita if you’re interested in the data). A robot with no mechanism for pointing and no sensory apparatus for detecting the pointing gestures of human agents in its environment will misunderstand a great deal and will not be able to communicate fluently.
Then there are the cultural conventions that regulate pointing words and gestures alike. For example, spatial meanings tend to be either speaker-relative or landmark-relative or absolute (that is, embedded in a spatial frame of cardinal directions) in a given culture, and whichever of these options the culture chooses is used in both physical pointing and linguistic pointing through deictics. A robot with no cultural reference won’t be able to disambigurate “there” (relative to me here now) versus “there” (relative to the river/mountain/rising sun), even if physical pointing is integrated into the attempt to figure out what “there” is. And the problem may not be detected due to the illustion of double transparency.
This gets even more complicated when the world of discourse shifts from the immediate environment to other places, other times, or abstract ideas. People don’t stop inhabiting the real world when they talk about abstract ideas. And what you see in conversation videos is people mapping the world of discourse metaphorically to physical locations or objects in their immediate environment. The space behind me becomes yesterday’s events and the space beyond my reach in front of me becomes tomorrow’s plan. Or I alway point to the left when I’m talking about George and to the right when I’m talking about Fred.
This is all very much an empirical question, as you say. I guess my point is that the data has been accumulating for several decades now that embodiment matters a great deal. Where and how it matters is just beginning to be sorted out.
If I am talking to you on the telephone, I have no mechanism for pointing and no sensory apparatus for detecting your pointing gestures, yet we can communicate just fine.
The whole embodied cognition thing is a massive, elementary mistake as bad as all the ones that Eliezer has analysed in the Sequences. It’s an instant fail.
Can you expand on this just a bit? I am leaning, slowly, in the same direction, and I’d like a bit of a sanity check on this claim.
Firstly, I have no problem with the “embodied cognition” idea so far as it relates to human beings (or animals, for that matter). Yes, people think also with their bodies, store memories in the environment, point at things, and so on. This seems to me both true and unremarkable. So unremarkable as to hardly be worth the amount of thought that apparently goes into it. While it may be interesting to trace out all the ways in which it happens, I see no philosophical importance in the details.
Where it goes wrong is the application to AGI that says that because people do this, it is an essential part of how an intellgence of any sort must operate, and therefore a man-made intelligent machine must be given a body. The argument mistakes a superficial fact about observed intelligences for a fact about the mechanism whereby an intelligence of any sort must operate. There is a large and expanding body of work on making ever more elaborate robot puppets like the Nao, explicitly following a research programme of developing “embodied cognition”.
I cannot see these projects as being of any interest. I would be a lot more interested in seeing someone build a human-sized robot that can run unsupported on two legs (Boston Dynamics’ ATLAS is getting there), especially if it can run faster than a man while carrying a full military pack and isn’t tethered to a power cable (not yet done). However, nothing like that is a prerequisite to AGI. I do hold a personal opinion, which I’m not going to argue for here, that if someone developed a simple method of solving the control problems of an all-terrain running robot, they might get from that some insight into how to get farther, such as an all-terrain running robot that can hunt down humans trying to avoid it. Of course, the Unfriendly directions that might lead are obvious, as are the military motivations for building such machines, or inviting people to come up with designs. Of course, these powers will only be used for Good.
Since the embodied approach has been around in strength since the 1980s, and can be found in Turing in 1950, I think it fair to say that if it worked beyond the toy projects that AGI attempts always produce, we would have seen it by now.
The deaf communicate without sound, the blind without sight, and the limbless without pointing hands. On the internet people communicate without any of these. It doesn’t seem to hold anyone up, except in the mere matter of speed in the case of Stephen Hawking communicating by twitching cheek muscles.
Ah, no, the magic ingredient must be society! Cognition always takes place within society. Feral children are developmentally disabled for want of society. The evidence is clear: we must develop societies of AIs before they can be intelligent.
No, it’s language they must have! AGIs cognition must be based on a language. So if we design the perfect language, AGI will be a snap.
No, it’s upbringing they must have! So we’ll design a robot to be initially like a newborn baby and teach it through experience!
No, it’s....
No. The general form of all these arguments is broken.
This is where you lose me. Isn’t that an equally effective argument against AGI in general?
“AGI in general” is a thing of unlimited broadness, about which lack of success so far implies nothing more than lack of success so far. Cf. flying machines, which weren’t made until they were. Embodied cognition, on the other hand, is a definite thing, a specific approach that is at least 30 years old, and I don’t think it’s even made a contribution to narrow AI yet. It is only mentioned in Russell and Norvig in their concluding section on the philosophy of Strong AI, not in any of the practical chapters.
I took RichardKennaway’s post to mean something like the following:
“Birds fly by flapping their wings, but that’s not the only way to fly; we have built airplanes, dirigibles and rockets that fly differently. Humans acquire intelligence (and language) by interacting with their physical environment using a specific set of sensors and effectors, but that’s not the only way to acquire intelligence. Tomorrow, we may build an AI that does so differently.”
But since that idea has been around in strength since the 1980s, and can be found in Turing in 1950, apparently it’s fair to say that if it worked beyond the toy projects that AGI attempts always produce, we would have seen it by now.
I think that we have seen it by now, we just don’t call it “AI”. Even in Turing’s day, we had radar systems that could automatically lock on to enemy planes and shoot them down. Today, we have search engines that can provide answers (with a significant degree of success) to textual or verbal queries; mapping software that can plot the best path through a network of roadways; chess programs that can consistently defeat humans; cars that drive themselves; planes that fly themselves; plus a host of other things like that. Sure, none of these projects are Strong AI, but neither are they toys.
This depends on the definition of ‘toy projects’ that you use. For the sort of broad definition you are using, where ‘toy projects’ refers literally to toys, Richard Kennaway’s original claim that the embodied approach had only produced toys is factually incorrect. For the definition of ‘toy projects’ that both Richard Kennaway and Document are using, in which ‘toy projects’ is more closely related to ‘toy models’- i.e.attempts at a simplified version of Strong AI- this is an argument against AGI in general.
I see what you mean, but I’m having trouble understanding what “a simplified version of Strong AI” would look like.
For example, can we consider a natural language processing system that’s connected to a modern search engine to be “a simplified version of Strong AI” ? Such a system is obviously not generally intelligent, but it does perform several important functions—such as natural language processing—that would pretty much be a requirement for any AGI. However, the implementation of such a system is most likely not generalizable to an AGI (if it were, we’d have AGI by now). So, can we consider it to be a “toy project”, or not ?
The “magic ingredient” may be a bridging of intuitions: an embodied AI which you can more naturally interact with offers more intuitive metrics for progress; milestones which can be used to attract funding since they make more sense intuitively.
Obviously you can build an AGI using only lego stones. And you can build an AGI “purely” as software (i.e. with variable hardware substrates). The steelman for pursuing embodied cognition would not be “embodiment is strictly necessary to build AGIs” (boring!), but that “given humans with a goal of building an AGI, going the embodiment route may be a viable approach”.
I well remember that early morning in the CS lab, the better part of a decade ago, when I stumbled—still half asleep—into a sideroom to turn on the lights, only to stare into the eye of Eccerobot (in an earlier incarnation), which was visiting our lab. Shudder.
I used to joke that my goal in life would be to build the successor creature, and to be judged by it (humankind and me both). To be judged and to be found unworthy in its (in this case single) eye, and to be smitten. After all, what better emotional proof to have created something of worth is there than your creation judging you to be unworthy? Take my atoms, Adambot!
Are misunderstanding more common over the telephone for things like negotiation?
I don’t know, but I doubt that the communication medium makes much difference beyond the individual skills of the people using it. People can use multiple modalities to communicate, and in a situation where some are missing, one varies one’s use of the others to accomplish the goal.
In adversarial negotiations one might even find it an advantage not to be seen, to avoid accidentally revealing things one wishes to keep secret. Of course, that applies to both parties, and it will come down to a matter of who is more skilled at using the means available.
People even manage to communicate in writing!
Sure, I agree that we make use of all kinds of contextual cues to interpret speech, and a system lacking awareness of that context will have trouble interpreting speech.For example, if I say “Do you like that?” to Sam, when Sam can’t see the thing I’m gesturing to indicate or doesn’t share the cultural context that lets them interpret that gesture, Sam won’t be able to interpret or engage with me successfully. Absolutely agreed. And this applies to all kinds of things, including (as you say) but hardly limited to pointing.
And, sure, the system may not even be aware of that trouble… illusions of transparency abound. Sam might go along secure in the belief that they know what I’m asking about and be completely wrong. Absolutely agreed.
And sure, I agree that we rely heavily on physical metaphors when discussing abstract ideas, and that a system incapable of processing my metaphors will have difficulty engaging with me successfully. Absolutely agreed.
All of that said, what I have trouble with is your apparent insistence that only a humanoid system is capable of perceiving or interpreting human contextual cues, metaphors, etc. That doesn’t seem likely to me at all, any more than it seems likely that a blind person (or one on the other end of a text-only link) is incapable of understanding human speech.
Are you really claiming that ability to understand the very concept of indexicality, and concepts like “soon”, “late”, “far”, etc., relies on humanlike fingers? That seems like an extraordinary claim, to put it lightly.
Also:
“Detecting pointing gestures” would be the function of a perception algorithm, not a sensory apparatus (unless what you mean is “a robot with no ability to perceive positions/orientations/etc. of objects in its environment”, which… wouldn’t be very useful). So it’s a matter of what we do with sense data, not what sorts of body we have; that is, software, not hardware.
More generally, a lot of what you’re saying (and — this is my very tentative impression — a lot of the ideas of embodied cognition in general) seems to be based on an idea that we might create some general-intelligent AI or robot, but have it start at some “undeveloped” state and then proceed to “learn” or “evolve”, gathering concepts about the world, growing in understanding, until it achieves some desired level of intellectual development. The concern then arises that without the kind of embodiment that we humans enjoy, this AI will not develop the concepts necessary for it to understand us and vice versa.
Ok. But is anyone working in AI these days actually suggesting that this is how we should go about doing things? Is everyone working in AI these days suggesting that? Isn’t this entire line of reasoning inapplicable to whole broad swaths of possible approaches to AI design?
P.S. What does “there, relative to the river” mean?
Yeah, I am advancing the hypothesis that, in humans, the comprehension of indexicality relies on embodied pointing at its core—though not just with fingers, which are not universally used for pointing in all human cultures. Sotaro Kita has the most data on this subject for language, but the embodied basis of mathematics is discussed in Where Mathematics Comes From, by by Geroge Lakoff and Rafael Nunez . Whether all possible minds must rely on such a mechanism, I couldn’t possibly guess. But I am persuaded humans do (a lot of) it with their bodies.
In most European cultures, we use speaker-relative deictics. If I point to the southeast while facing south and say “there”, I mean “generally to my front and left”. But if I turn around and face north, I will point to the northwest and say “there” to mean the same thing, ie, “generally to my front and left.” The fact that the physical direction of my pointing gesture is different is irrelevant in English; it’s my body position that’s used as a landmark for finding the target of “there”. (Unless I’m pointing at something in particular here and now, of course; in which case the target of the pointing action becomes its own landmark.)
In a number of Native American languages, the pointing is always to a cardinal direction. If the orientation of my body changes when I say “there”, I might point over my shoulder rather than to my front and left. The landmark for finding the target of “there” is a direction relative to the trajetory of the sun.
But many cultures use a dominant feature of the landscape, like the Amazon or the Missippi or the Nile rivers, or a major mountain range like the Rockies, or a sacred city like Mecca, as the orientation landmark, and in some cultures this gets encoded in the deictics of the language and the conventions for pointing. “Up” might not mean up vertically, but rather “upriver”, while “down” would be “downriver”. In a steep river valley in New Guinea, “down” could mean “toward the river” and “up” could mean “away from the river”. And “here” could mean “at the river” while “there” could mean “not at the river”.
The cultural variability and place-specificity of language was not widely known to Western linguists until about ten years ago. For a long time, it was assumed that person-relative orientation was a biological constraint on meaning. This turns out to be not quite accurate. So I guess I should be more nuanced in the way I present the notion of embodied cognition. How’s this: “Embodied action in the world with a cultural twist on top” is the grounding point at the bottom of the symbol expansion for human meanings, linguistic and otherwise.
I was able to follow this explanation (as well as the rest of your post) without seeing your physical body in any way. In addition, I suspect that, while you were typing your paragraph, you weren’t physically pointing at things. The fact that we can do this looks to me like evidence against your main thesis.
Ah, but you’re assuming that this particular interaction stands on its own. I’ll bet you were able to visualize the described gestures just fine by invoking memories of past interactions with bodies in the world.
Two points. First, I don’t contest the existence of verbal labels that merely refer—or even just register as being invoked without refering at all. As long as some labels are directly grounded to body/world, or refer to other labels that do get grounded in the body/world historically, we generally get by in routine situations. And all cultures have error detection and repair norms for conversation so that we can usually recover without social disaster.
However, the fact that verbal labels can be used without grounding them in the body/world is a problem. It is frequently the case that speakers and hearers alike don’t bother to connect words to reality, and this is a major source of misunderstanding, error, and nonsense. In our own case here and now, we are actually failing to understand each other fully because I can’t show you actual videotapes of what I’m talking about. You are rightly skeptical because words alone aren’t good enough evidence. And that is itself evidence.
Second, humans have a developmental trajectory and history, and memories of that history. We’re a time-binding animal in Korzybski’s terminology. I would suggest that an enculturated adult native speaker of a language will have what amount to “muscle memory” tics that can be invoked as needed to create referents. Mere memory of a motion or a perception is probably sufficient.
“Oh, look, it’s an invisible gesture!” is not at all convincing, I realize, so let me summarize several lines of evidence for it.
Developmentally, there’s quite a lot of research on language acquisition in infants and young children that suggests shared attention management—through indexical pointing, and shared gaze, and physical coercion of the body, and noises that trigger attention shift—is a critical building block for constructing “aboutness” in human language. We also start out with some shared, built-in cries and facial expressions linked to emotional states. At this level of development, communication largely fails unless there is a lot of embodied scaffolding for the interaction, much of it provided by the caregiver but a large part of it provided by the physical context of the interaction. There is also some evidence from the gestural communication of apes that attests to the importance of embodied attention management in communication.
Also, co-speech gesture turns out to be a human universal. Congenitally blind children do it, having never seen gesture by anyone else. Congenitally deaf children who spend time in groups together will invent entire gestural languages complete with formal syntax, as recently happened in Nicaragua. And adults speaking on the telephone will gesture even knowing they cannot be seen. Granted, people gesture in private at a significantly lower rate than they do face-to-face, but the fact that they do it at all is a bit of a puzzle, since the gestures can’t be serving a communicative function in these contexts. Does the gesturing help the speakers actually think, or at least make meaning more clear to themselves? Susan Goldin-Meadow and her colleagues think so.
We also know from video conversation data that adults spontaneously invent new gestures all the time in conversation, then reuse them. Interestingly, though, each reuse becomes more attentuated, simplified, and stylized with repetition. Similar effects are seen in the development of sign languages and in written scripts.
But just how embodied can a label be when gesture (and other embodied experience) is just a memory, and is so internalized that is is externally invisible? This has actually been tested experimentally. The Stroop effect has been known for decades, for example: when the word “red” is presented in blue text, it is read or acted on more slowly than when the word “red” is presented in red text—or in socially neutral black text. That’s on the embodied perception side of things. But more recent psychophysical experiments have demonstrated a similar psychomotor Stroop-like effect when spatial and motion stimulus sentences are semantically congruent with the direction of the required response action. This effect holds even for metaphorical words like “give”, which tests as motor-congruent with motion away from oneself, and “take”, which tests as motor-congruent with motion toward oneself.
I understand how counterintuitive this stuff can be when you first encounter it—especially to intelligent folks who work with codes or words or models a great deal. I expect the two of us will never reach a consensus on this without looking at a lot of original data—and who has the time to analyze all the data that exists on all the interesting problems in the world? I’d be pleased if you could just note for future reference that a body of empirical evidence exists for the claim. That’s all.
What do you mean by “fully” ? I believe I understand you well enough for all practical purposes. I don’t agree with you, but agreement and understanding are two different things.
I’m not sure what you mean by “merely refer”, but keep in mind that we humans are able to communicate concepts which have no physical analogues that would be immediately accessible to our senses. For example, we can talk about things like “O(N)”, or “ribosome”, or “a^n +b^n = c^n”. We can also talk about entirely imaginary worlds, such as f.ex. the world where Mario, the turtle-crushing plumber, lives. And we can do this without having any “physical context” for the interaction, too.
All that is beside the point, however. In the rest of your post, you bring up a lot of evidence in support of your model of human development. That’s great, but your original claim was that any type of intelligence at all will require a physical body in order to develop; and nothing you’ve said so far is relevant to this claim. True, human intelligence is the only kind we know of so far, but then, at one point birds and insects were the only self-propelled flyers in existence—and that’s not the case anymore.
Furthermore, your also claimed that no simulation, no matter how realistic, will serve to replace the physical world for the purposes of human development, and I’m still not convinced that this is true, either. As I’d said before, we humans do not have perfect senses; if physical coordinates of real objects were snapped to a 0.01mm grid, no human child would ever notice. And in fact, there are plenty of humans who grow up and develop language just fine without the ability to see colors, or to move some of their limbs in order to point at things.
Just to drive the point home: even if I granted all of your arguments regarding humans, you would still need to demonstrate that human intelligence is the only possible kind of intelligence; that growing up in a human body is the only possible way to develop human intelligence; and that no simulation could in principle suffice, and the body must be physical. These are all very strong claims, and so far you have provided no evidence for any of them.
Let me refer you to Computation and Human Experience, by Philip E. Agre, and to Understanding Computers and Cognition, by Terry Winograd and Fernando Flores.
Can you summarize the salient parts ?
But wait; whether all possible minds must rely on such a mechanism is the entire question at hand! Humans implement this feature in some particular way? Fine; but this thread started by discussing what AIs and robots must do to implement the same feature. If implementation-specific details in humans don’t tell us anything interesting about implementation constraints in other minds, especially artificial minds which we are in theory free to place anywhere in mind design space, then the entire topic is almost completely irrelevant to an AI discussion (except possible as an example of “well, here is one way you could do it”).
Er, what? I thought I was a member of a European culture, but I don’t think this is how I use the word “there”. If I point to some direction while facing somewhere, and say “there”, I mean… “in the direction I am pointing”.
The only situation when I’d use “there” in the way you describe is if I were describing some scenario involving myself located somewhere other than my current location, such that absolute directions in the story/scenario would not be the same as absolute directions in my current location.
If this is accurate, then why on earth would we map this word in this language to the English “there”? It clearly does not remotely resemble how we use the word “there”, so this seems to be a case of poor translation rather than an example of cultural differences.
Yeah, actually, this research I was aware of. As I recall, the Native Americans in question had some difficulty understanding the Westerners’ concepts of speaker-relative indexicals. But note: if we can have such different concepts of indexicality, despite sharing the same pointing digits and whatnot… it seems premature, at best, to suggest that said hardware plays such a key role in our concept formation, much less in the possibility of having such concepts at all.
Ultimately, the interesting aspect of this entire discussion (imo, of course) is what these human-specific implementation details can tell us about other parts of mind design space. I remain skeptical that the answer is anything other than “not much”. (Incidentally, if you know of papers/books that address this aspect specifically, I would be interested.)
Ok, but is this the correct conclusion ? It’s pretty obvious that a SHRDLU-style simulation is not sufficient to achieve natural language understanding, but can you generalize that to saying that no conceivable simulation is sufficient ? As far as I can tell, you would make such a generalization because,
While this is true, it is also true that our human senses cannot fully perceive the reality around us with infinite fidelity. A child who is still learning his native tongue can’t a rock that is 5cm in diameter from a rock that’s 5.000001cm in diameter. This would lead me to believe that your simulation does not need 7 significant figures of precision in order to produce a language-speaking mind.
In fact, a colorblind child can’t tell a red-colored ball from a green-colored ball, and yet colorblind adults can speak a variety of languages, so it’s possible that your simulation could be monochrome and still achieve the desired result.
I agree that Searle believes in magic, but “intentionality” is not magic (see: almost anything Dennett has written).
This sounds interesting. Could you expand on this?
A list of references can be found in an earlier post in this thread.
Welcome!
Yeah. This, and the “existential angst” thing, seem to be common problems on LW, and I’ve never been sure why. I think that keeping yourself busy doing practical stuff prevents it from becoming an issue.
That’s fascinating! What research has been done on this! I would totally be interested in reading more about it.
Jurgen Streeck’s book Gesturecraft: The manu-facture of meaning is a good summary of Streeck’s cross-linguistic research on the interaction of gesture and speech in meaning creation. The book is pre-theoretical, for the most part, but Streeck does make an important claim that the biological covariation in a speaker or hearer across the somatosensory modes of gesture, vision, audition, and speech do the work of abstraction—which is an unsolved problem in my book.
Streeck’s claim happens to converge with Eric Kandel’s hypothesis that abstraction happens when neurological activity covaries across different somatosensory modes. After all, the only things that CAN covary across, say, musical tone changes in the ear and dance moves in the arms, legs, trunk, and head, are abstract relations. Temporal synchronicity and sequence, say.
Another interesting book is Cognition in the Wild by Edwin Hutchins. Hutchins goes rather too far in the direction of externalizing cognition from the participants in the act of knowing, but he does make it clear that cultures build tools into the environment that offload thinking function and effort, to the general benefit of all concerned. Those tools get included by their users in the manufacture of online meaning, to the point that the online meaning can’t be reconstructed from the words alone.
The whole field of conversation analysis goes into the micro-organization of interactive utterances from a linguistic point of view rather than a cognitive perspective. The focus is on the social and communicative functions of empirically attested language structures as demonstrated by the speakers themselves to one another. Anything written by John Heritage in that vein is worth reading, IMO.
EDIT: Revised, consolidated, and expanded bibliography on interactive construction of meaning:
LINGUISTICS
Philosophy in the Flesh, by George Lakoff and Mark Johnson
Women, Fire and Dangerous Things, by George Lakoff
The Singing Neaderthals, by Steven Mithen
CONVERSATION ANALYSIS & GESTURE RESEARCH
Handbook of Conversation Analysis, by Jack Sidnell & Tanya Stivers
Gesturecraft: The Manu-facture of Meaning, by Jurgen Streeck
Pointing: Where Language, Culture, and Cognition Meet, by Sotaro Kita
Gesture: Visible Action as Utterance, by Adam Kendon
Hearing Gesture: How Our Hands Help Us Think, by Susan Goldin-Meadow
Hand and Mind: What Gestures Reveal about Thought, by David McNeill
COGNITIVE PSYCHOLOGY
Symbols and Embodiment, edited by Manuel de Vega, Arthur M Glenberg, & Arthur C Graesser
Cognition in the Wild, Edwin Hutchins
Thanks! Neat.