As a very rough distinction, somewhat analogous to Type 1 and Type 2 reasoning, we can divide human expertise into two components: pattern recognition and mental simulation. An excerpt:
There exists a preliminary understanding, if not of the details of human decision-making, then at least the general outline. A picture that emerges from this research is that expertise is about developing the correct mental representations (Klein 1999, Ericsson and Pool 2016).
A mental representation is a very general concept, roughly corresponding to any mental structure forming the content of something that the brain is thinking about (Ericsson and Pool 2016).
Domain-specific mental representations are important because they allow experts to know what something means; know what to expect; know what good performance should feel like; know how to achieve the good performance; know the right goals for a given situation; know the steps necessary for achieving those goals; mentally simulate how something might happen; learn more detailed mental representations for improving their skills (Klein 1999, Ericsson and Pool 2016).
Although good decision-making is often thought of as a careful deliberation of all the possible options, such a type of thinking tends to be typical of novices (Klein 1999). A novice will have to try to carefully reason their way through to an answer, and will often do poorly regardless, because they do not know what things are relevant to take into account and which ones are not. An expert does not need to—they are experienced enough to instantly know what to do.
A specific model of expertise is the recognition-primed decision-making model (Klein 1999). First, a decision-maker sees some situation, such as a fire for a firefighter or a design problem for an architect. The situation may then be recognized as familiar, such as a typical garage fire. Recognizing a familiar situation means understanding what goals make sense and what should be focused on, which cues to pay attention to, what to expect next and when a violation of expectations shows that something is amiss, and knowing what the typical ways of responding are. Ideally, the expert will instantly know what to do.
The expectations arising from mental representations also give rise to intuition. As one example, Klein (1999) describes the case of a firefighter lieutenant responding to a kitchen fire in an ordinary one-story residential house. The lieutenant’s crew sprayed water on the fire, but contrary to expectations, the water seemed to have little impact. Something about the situation seemed wrong to the lieutenant, who ordered his crew out of the house. As soon as they had left the house, the floor where they had been standing collapsed. If the firefighters had not pulled out, they would have fallen down to the fire raging in the basement. The lieutenant, not knowing what had caused him to give the order to withdraw, initially attributed the decision to some form of extra-sensory perception.
In a later interview, the lieutenant explained that he did not suspect that the building had a basement, nor that the seat of the fire was under the floor that he and his crew were standing on. However, several of his expectations of a typical kitchen fire were violated by the situation. The lieutenant was wondering why the fire did not react to water as expected, the room was much hotter than he would have expected out of a small kitchen fire, and while a heat that hot should have made a great deal of noise, it was very quiet. The mismatch between the expected pattern and the actual situation led to an intuitive feeling of not knowing what was going on, leading to the decision to regroup. This is intuition: an automatic comparison of the situation against existing mental representations of similar situations, guiding decision-making in ways whose reasons are not always consciously available.
In an unfamiliar situation, the expert may need to construct a mental simulation of what is going on, how things might have developed to this point, and what effect different actions would have. Had the floor mentioned in the previous example not collapsed, given time the firefighter lieutenant might have been able to put the pieces together and construct a narrative of a fire starting from the basement to explain the discrepancies. For a future-oriented example, a firefighter thinking about how to rescue someone from a difficult spot might mentally simulate where different rescue harnesses might be attached on the person, and whether that would exert dangerous amounts of force on them.
Mental representations are necessary for a good simulation, as they let the expert know what things to take into account, what things could plausibly be tried, and what effects they would have. In the example, the firefighter’s knowledge allows him to predict that specific ways of attaching the rescue harness would have dangerous consequences, while others are safe.
Similar to the firefighter’s intuition, GPT-2 has the ability to make predictions about what’s most likely to “happen next”. But there are also several differences.
Most notably, GPT-2′s only goal is just that: predict what’s going to happen next. This is a much more limited task than the one faced by (for example) a human firefighter, who needs to not just predict how a fire might proceed, but also how to best respond to it and which actions to take.
Let’s take a concrete example and look at it in more detail; from Klein 1999:
The initial report is of flames in the basement of a four-story apartment building: a one-alarm fire. The [firefighter] commander arrives quickly and does not see anything. There are no signs of smoke anywhere. He finds the door to the basement, around the side of the building, enters, and sees flames spreading up the laundry chute. That’s simple: a vertical fire that will spread straight up. Since there are no external signs of smoke, it must just be starting.
The way to fight a vertical fire is to get above it and spray water down, so he sends one crew up to the first floor and another to the second floor. Both report that the fire has gotten past them. The commander goes outside and walks around to the front of the building. Now he can see smoke coming out from under the eaves of the roof.
It is obvious what has happened: the fire has gone straight up to the fourth floor, has hit the ceiling there, and is pushing smoke down the hall. Since there was no smoke when he arrived just a minute earlier, this must have just happened. It is obvious to him how to proceed now that the chance to put out the fire quickly is gone. He needs to switch to search and rescue, to get everyone out of the building, and he calls in a second alarm. The side staircase near the laundry chute had been the focus of activity before. Now the attention shifts to the front stairway as the evacuation route.
And this picture shows the general algorithm for recognition-primed decision-making. Breaking down the story, we might say that something like the following happened:
1. The commander sees no smoke outside, then flames spreading up the laundry chute. These are the cues that allow him to recognize this as a vertical fire that is just starting.
2. The commander’s mental representation of vertical fires includes plausible goals, expectancies of what is going to happen, and actions that could further the goals. A plausible goal for this situation: put the fire out quickly, before it has a chance to spread. An action that would further it: send people to spray water on the fire from above. A rapid mental simulation suggests that this should work, so he gives the order.
3. The crews report that the fire has gotten past them. This violates the expectancy that the fire should be in the basement only; to diagnose this anomaly, the commander goes outside to gather more data. When he sees the smoke coming up from the roof, this allows him to construct a story of what has happened.
4. The situation is now different, calling up a new mental representation: that of a fire that has spread from the basement to the top floor. Plausible goals in this situation: get everyone out of the building. Actions to take here: call in reinforcements, get people to the front stairway to carry out an evacuation.
As at least one major difference, GPT-2 never does the thing where it expects that something will happen, and then takes actions to re-evaluate the situation if the prediction goes wrong. If it predicts “the word after ‘maximize’ is going to be ‘paperclip’” with 90% confidence, finding out that it’s actually followed by ‘human values’ doesn’t cause it to...
Actually, I don’t need to complete that sentence, because “seeing that it was mistaken” isn’t actually a thing that happens to GPT-2 in the first place. It does get feedback to its predictions during its training phase, but once it has been trained, it will never again compare its prediction with the actual result. You just give it a prompt and then it tries to predict the rest, that’s it. If you give it one prompt, have it predict the rest of it, and then give it a revised prompt with the correct completion, it has no idea that you are doing this. It just sees one prompt and then another. This makes it incapable of noticing that its expectations are violated, gathering more information in return, and then constructing a story of what happened and what kind of a situation it’s actually in.
You could probably apply it to something like “predict what a human firefighter would do in this situation” (imitation learning), but as anyone can verify by playing AI Dungeon (which now uses GPT-3 not GPT-2), its predictions get very nonsensical very quickly. It doesn’t really do the kind of causal reasoning that would involve mental simulations to produce novel responses, e.g. the following example from Klein:
A [firefighter] lieutenant is called out to rescue a woman who either fell or jumped off a highway way overpass. She is drunk or on drugs and is probably trying to kill herself. Instead of falling to her death, she lands on the metal supports of a highway sign and is dangling there when the rescue team arrives.
The lieutenant recognizes the danger of the situation. The woman is semiconscious and lying bent over one of the metal struts. At any moment, she could fall to her death on the pavement below. If he orders any of his team out to help her, they will be endangered because there is no way to get a good brace against the struts, so he issues an order not to climb out to secure her.
Two of his crew ignore his order and climb out anyway. One holds onto her shoulders and the other to her legs.
A hook-and-ladder truck arrives. The lieutenant doesn’t need their help in making the rescue, so tells them to drive down to the highway below and block traffic in case the woman does fall. He does not want to chance that the young woman will fall on a moving car.
Now the question is how to pull the woman to safety.
First, the lieutenant considers using a rescue harness, the standard way of raising victims. It snaps onto a person’s shoulders and thighs. In imagining its use, he realizes that it requires the person to be in a sitting position or face up. He thinks about how they would shift her to sit up and realizes that she might slide off the support.
Second, he considers attaching the rescue harness from the back. However, he imagines that by lifting the woman, they would create a large pressure on her back, almost bending her double. He does not want to risk hurting her.
Third, the lieutenant considers using a rescue strap-another way to secure victims, but making use of a strap rather than a snap-on harness. However, it creates the same problems as the rescue harness, requiring that she be sitting up or that it be attached from behind. He rejects this too.
Now he comes up with a novel idea: using a ladder belt-a strong belt that firefighters buckle on over their coats when they climb up ladders to rescue people. ple. When they get to the top, they can snap an attachment on the belt to the top rung of the ladder. If they lose their footing during the rescue, they are still attached to the ladder so they won’t plunge to their death.
The lieutenant’s idea is to get a ladder belt, slide it under the woman, buckle it from behind (it needs only one buckle), tie a rope to the snap, and lift her up to the overpass. He thinks it through again and likes the idea, so he orders one of his crew to fetch the ladder belt and rope, and they tie it onto her.
In the meantime, the hook-and-ladder truck has moved to the highway below the overpass, and the truck’s crew members raise the ladder. The firefighter on the platform at the top of the ladder is directly under the woman shouting, “I’ve got her. I’ve got her.” The lieutenant ignores him and orders his men to lift her up.
At this time, he makes an unwanted discovery: ladder belts are built for sturdy firefighters, to be worn over their coats. This is a slender woman wearing a thin sweater. In addition, she is essentially unconscious. When they lift her up, they realize the problem. As the lieutenant put it, “She slithered through the belt like a slippery strand of spaghetti.”
Fortunately, the hook-and-ladder man is right below her. He catches her and makes the rescue. There is a happy ending.
Now the lieutenant and his crew go back to their station to figure out what had gone wrong. They try the rescue harness and find that the lieutenant’s instincts were right: neither is usable.
Eventually they discover how they should have made the rescue. They should have used the rope they had tied to the ladder belt. They could have tied it to the woman and lifted her up. With all the technology available to them, they had forgotten that you can use a rope to pull someone up.
Consider the lieutenant’s first idea. Possibly GPT-2 might have been able to notice that statistically, firefighters typically use rescue harnesses in situations like this. But it doesn’t do any mental simulation to see what the predicted outcome of using that harness would be. If there had been enough previous situations where a harness was unusable, and enough cues to indicate to GPT-2 that this was one of those situations, then it could accurately predict that the rescuers would do something different. But if this is a novel situation (as most situations are), then it needs to actually do causal reasoning and notice that the woman would slide off the support. (This is similar to your “check for badness” thing, except it happens via mental simulation rather than just association.)
Orthonormal gives us a real-life example of what happens when an AI uses pattern-matching, but does not do causal reasoning, and then tries to play Starcraft:
The overhyped part is that AlphaStar doesn’t really do the “strategy” part of real-time strategy. Each race has a few solid builds that it executes at GM level, and the unit control is fantastic, but the replays don’t look creative or even especially reactive to opponent strategies.
That’s because there’s no representation of causal thinking—“if I did X then they could do Y, so I’d better do X’ instead”. Instead there are many agents evolving together, and if there’s an agent evolving to try Y then the agents doing X will be replaced with agents that do X’. But to explore as much as humans do of the game tree of viable strategies, this approach could take an amount of computing resources that not even today’s DeepMind could afford.
(This lack of causal reasoning especially shows up in building placement, where the consequences of locating any one building here or there are minor, but the consequences of your overall SimCity are major for how your units and your opponents’ units would fare if they attacked you. In one comical case, AlphaStar had surrounded the units it was building with its own factories so that they couldn’t get out to reach the rest of the map. Rather than lifting the buildings to let the units out, which is possible for Terran, it destroyed one building and then immediately began rebuilding it before it could move the units out!)
AlphaStar notices that the units are trapped, which it associates with “must destroy the thing that is trapping them”. Then it notices that it is missing a factory, and its associations tell it that in this situation it should have one more factory, and it should be located right where the destroyed factory should be, so...
In contrast, a human might have considered destroying the factory, but then noticed that this leads to a situation where there is one factory too little; and then realized that the building can just be lifted out of the way.
Here is Klein’s illustration of how a generic mental simulation seems to work; he also has illustrations of the more specific variant of explaining the past (e.g. the firefighter commander constructing a story of how the basement fire had spread) and projecting to the future (e.g. the firefighter lieutenant trying to figure out how to rescue the woman). Here’s him explaining a part of the figures:
Consider this example. Some need arises for building a mental simulation; let us say a coworker has suddenly started acting rudely toward you. The simulation has to let you infer what the original situation was that led to the events you are observing. You assemble the action sequence: the set of transitions that make up the simulation. Perhaps you recall an incident that same morning when you were chatting with some other people in your office and said something that made them laugh. Perhaps you also recall that earlier that morning, your coworker had confided some embarrassing secret to you. So you construct a sequence in which your coworker trusts you with a confidence, then regrets it immediately afterward and feels a little awkward around you, then sees you possibly entertaining some other people with the secret, and then feels that it is going to be unbearable to live with you in the same setting. Now you can even remember that after you made the other people laugh, you looked up and saw the coworker giving you a look that made you feel uneasy. This set of states and transitions is the action sequence, the mental simulation that explains the rude behavior.
The next step is to evaluate the action sequence at a surface level. Is it coherent (Do the steps follow from each other)? Yes, it is. Does it apply (Does the sequence account for the rudeness)? Yes, it does. How complete is it (Does it leave out any important factors, such as the excellent performance evaluation you have just received)? Yes, there are some more pieces that might belong to the puzzle. But in general, the mental simulation passes the internal evaluation. It is an acceptable explanation. That does not mean it is correct.
Sometimes the mental simulation will not pass the internal evaluation, and that also helps you make sense of things. [The following example] illustrates this with a story reported in a newspaper. [...]
The IRA Terrorist. A well-respected lawyer has agreed to defend a man accused of committing an act of terrorism: planting a bomb for the IRA. The lawyer, asked why he would take the case, answers that he interviewed the accused man, who was shaking and literally overcome with panic. He was surprised to see the man fall apart like that. He tried to imagine the IRA’s recruiting such a person for a dangerous mission and found that he could not. He cannot conjure up a scenario in which the IRA would give a terrorism assignment to a man like this, so his conclusion is that the man is innocent.
This lawyer could not generate an action sequence that passed his internal evaluation-specifically, the requirement that the transition between steps be plausible. His failure to assemble a plausible sequence of steps led him to a different explanation than the prosecutors had formed. That’s why you see a long, curving arc in figure 5.4: the failure to assemble the mental simulation was the basis of the conclusion.
There are also times when you use mental simulation to try to increase your understanding of situations like these. You are trying to build up better models. When you run the action sequence in your mind, you may notice parts that still seem vague. Maybe you can figure out how to set up a better action sequence, or maybe there are some more details about the present state that you should gather. Going back to the example of your coworker, your explanation has not included the fact that you received such a good performance evaluation. What was your coworker’s performance evaluation? Possibly the coworker felt you had gotten recognition for work that someone else had done. Perhaps you can get a general sense of the situation by talking to your boss. That might give you some more data points for building your explanations.
(If you look at explanations of how GPT’s Transformer architecture works [1, 2], you can see that it doesn’t do anything like this.)
In my OP, my claim was basically “you probably can get human-level output out of something GPT-like by giving it longer term rewards/punishments, and having it continuously learn” (i.e. give it an actual incentive to figure out how to fight fires in novel situations, which current GPT doesn’t have).
I realize that leaves a lot of fuzziness in “well, is it really GPT if has a different architecture that continuously learns and has longterm rewards?”. My guess was that it’d be fairly different from GPT architecturally, but that it wouldn’t depend on architectural insights we haven’t already made, it’d just be work to integrate existing insights.
Is your claim “this is insufficient – you still need working memory and the ability to model scenarios, and currently we don’t know how to do that, and there are good reasons to think that throwing lots of data and better reward structures at our existing algorithms won’t be enough to cause this to develop automatically via Neural Net Magic?”
Is your claim “this is insufficient – you still need working memory and the ability to model scenarios, and currently we don’t know how to do that, and there are good reasons to think that throwing lots of data and better reward structures at our existing algorithms won’t be enough to cause this to develop automatically via Neural Net Magic?”
So at this point I’m pretty uncertain of what neural nets can or can not learn to do. But at least I am confident in saying that GPT isn’t going to learn the kinds of abilities that would be required for actually fighting fires, as it is trained and tested on a fundamentally static task, as opposed to one that requires adapting your behavior to a situation as it develops. For evaluating at progress on those, projects like AlphaStar look like more relevant candidates.
I don’t feel confident in saying whether some combination of existing algorithms and training methods could produce a system that approached the human level on dynamic tasks. Most people seem to agree that we haven’t gotten neural nets to learn to do good causal reasoning yet, so my understanding of the expert consensus is that current techniques seem inadequate… but then the previous expert consensus would probably also have judged neural nets to be inadequate for doing many of the tasks that they’ve now mastered.
For my paper “How Feasible is the Rapid Development of Artificial Superintelligence?”, I looked at some of the existing literature on human expertise to develop a model of exactly what it is that human intelligence consists of.
As a very rough distinction, somewhat analogous to Type 1 and Type 2 reasoning, we can divide human expertise into two components: pattern recognition and mental simulation. An excerpt:
Similar to the firefighter’s intuition, GPT-2 has the ability to make predictions about what’s most likely to “happen next”. But there are also several differences.
Most notably, GPT-2′s only goal is just that: predict what’s going to happen next. This is a much more limited task than the one faced by (for example) a human firefighter, who needs to not just predict how a fire might proceed, but also how to best respond to it and which actions to take.
Let’s take a concrete example and look at it in more detail; from Klein 1999:
And this picture shows the general algorithm for recognition-primed decision-making. Breaking down the story, we might say that something like the following happened:
1. The commander sees no smoke outside, then flames spreading up the laundry chute. These are the cues that allow him to recognize this as a vertical fire that is just starting.
2. The commander’s mental representation of vertical fires includes plausible goals, expectancies of what is going to happen, and actions that could further the goals. A plausible goal for this situation: put the fire out quickly, before it has a chance to spread. An action that would further it: send people to spray water on the fire from above. A rapid mental simulation suggests that this should work, so he gives the order.
3. The crews report that the fire has gotten past them. This violates the expectancy that the fire should be in the basement only; to diagnose this anomaly, the commander goes outside to gather more data. When he sees the smoke coming up from the roof, this allows him to construct a story of what has happened.
4. The situation is now different, calling up a new mental representation: that of a fire that has spread from the basement to the top floor. Plausible goals in this situation: get everyone out of the building. Actions to take here: call in reinforcements, get people to the front stairway to carry out an evacuation.
As at least one major difference, GPT-2 never does the thing where it expects that something will happen, and then takes actions to re-evaluate the situation if the prediction goes wrong. If it predicts “the word after ‘maximize’ is going to be ‘paperclip’” with 90% confidence, finding out that it’s actually followed by ‘human values’ doesn’t cause it to...
Actually, I don’t need to complete that sentence, because “seeing that it was mistaken” isn’t actually a thing that happens to GPT-2 in the first place. It does get feedback to its predictions during its training phase, but once it has been trained, it will never again compare its prediction with the actual result. You just give it a prompt and then it tries to predict the rest, that’s it. If you give it one prompt, have it predict the rest of it, and then give it a revised prompt with the correct completion, it has no idea that you are doing this. It just sees one prompt and then another. This makes it incapable of noticing that its expectations are violated, gathering more information in return, and then constructing a story of what happened and what kind of a situation it’s actually in.
You could probably apply it to something like “predict what a human firefighter would do in this situation” (imitation learning), but as anyone can verify by playing AI Dungeon (which now uses GPT-3 not GPT-2), its predictions get very nonsensical very quickly. It doesn’t really do the kind of causal reasoning that would involve mental simulations to produce novel responses, e.g. the following example from Klein:
Consider the lieutenant’s first idea. Possibly GPT-2 might have been able to notice that statistically, firefighters typically use rescue harnesses in situations like this. But it doesn’t do any mental simulation to see what the predicted outcome of using that harness would be. If there had been enough previous situations where a harness was unusable, and enough cues to indicate to GPT-2 that this was one of those situations, then it could accurately predict that the rescuers would do something different. But if this is a novel situation (as most situations are), then it needs to actually do causal reasoning and notice that the woman would slide off the support. (This is similar to your “check for badness” thing, except it happens via mental simulation rather than just association.)
Orthonormal gives us a real-life example of what happens when an AI uses pattern-matching, but does not do causal reasoning, and then tries to play Starcraft:
AlphaStar notices that the units are trapped, which it associates with “must destroy the thing that is trapping them”. Then it notices that it is missing a factory, and its associations tell it that in this situation it should have one more factory, and it should be located right where the destroyed factory should be, so...
In contrast, a human might have considered destroying the factory, but then noticed that this leads to a situation where there is one factory too little; and then realized that the building can just be lifted out of the way.
Here is Klein’s illustration of how a generic mental simulation seems to work; he also has illustrations of the more specific variant of explaining the past (e.g. the firefighter commander constructing a story of how the basement fire had spread) and projecting to the future (e.g. the firefighter lieutenant trying to figure out how to rescue the woman). Here’s him explaining a part of the figures:
(If you look at explanations of how GPT’s Transformer architecture works [1, 2], you can see that it doesn’t do anything like this.)
So, doublechecking my comprehension:
In my OP, my claim was basically “you probably can get human-level output out of something GPT-like by giving it longer term rewards/punishments, and having it continuously learn” (i.e. give it an actual incentive to figure out how to fight fires in novel situations, which current GPT doesn’t have).
I realize that leaves a lot of fuzziness in “well, is it really GPT if has a different architecture that continuously learns and has longterm rewards?”. My guess was that it’d be fairly different from GPT architecturally, but that it wouldn’t depend on architectural insights we haven’t already made, it’d just be work to integrate existing insights.
Is your claim “this is insufficient – you still need working memory and the ability to model scenarios, and currently we don’t know how to do that, and there are good reasons to think that throwing lots of data and better reward structures at our existing algorithms won’t be enough to cause this to develop automatically via Neural Net Magic?”
So at this point I’m pretty uncertain of what neural nets can or can not learn to do. But at least I am confident in saying that GPT isn’t going to learn the kinds of abilities that would be required for actually fighting fires, as it is trained and tested on a fundamentally static task, as opposed to one that requires adapting your behavior to a situation as it develops. For evaluating at progress on those, projects like AlphaStar look like more relevant candidates.
I don’t feel confident in saying whether some combination of existing algorithms and training methods could produce a system that approached the human level on dynamic tasks. Most people seem to agree that we haven’t gotten neural nets to learn to do good causal reasoning yet, so my understanding of the expert consensus is that current techniques seem inadequate… but then the previous expert consensus would probably also have judged neural nets to be inadequate for doing many of the tasks that they’ve now mastered.
Thanks, this is great. May have more thoughts after thinking it over a bit.