So I keep seeing takes about how to tell if LLMs are “really exhibiting goal-directed behavior” like a human or whether they are instead “just predicting the next token”. And, to me at least, this feels like a confused sort of question that misunderstands what humans are doing when they exhibit goal-directed behavior.
Concrete example. Let’s say we notice that Jim has just pushed the turn signal lever on the side of his steering wheel. Why did Jim do this?
The goal-directed-behavior story is as follows:
Jim pushed the turn signal lever because he wanted to alert surrounding drivers that he was moving right by one lane
Jim wanted to alert drivers that he was moving one lane right because he wanted to move his car one lane to the right.
Jim wanted to move his car one lane to the right in order to accomplish the goal of taking the next freeway offramp
Jim wanted to take the next freeway offramp because that was part of the most efficient route from his home to his workplace
Jim wanted to go to his workplace because his workplace pays him money
Jim wants money because money can be exchanged for goods and services
Jim wants goods and services because they get him things he terminally values like mates and food
But there’s an alternative story:
When in the context of “I am a middle-class adult”, the thing to do is “have a job”. Years ago, this context triggered Bob to perform the action “get a job”, and now he’s in the context of “having a job”.
When in the context of “having a job”, “showing up for work” is the expected behavior.
Earlier this morning, Bob had the context “it is a workday” and “I have a job”, which triggered Bob to begin the sequence of actions associated with the behavior “commuting to work”
Bob is currently approaching the exit for his work—with the context of “commuting to work”, this means the expected behavior is “get in the exit lane”, and now he’s in the context “switching one lane to the right”
In the context of “switching one lane to the right”, one of the early actions is “turn on the right turn signal by pushing the turn signal lever”. And that is what Bob is doing right now.
I think this latter framework captures some parts of human behavior that the goal-directed-behavior framework misses out on. For example, let’s say the following happens
Jim is going to see his good friend Bob on a Saturday morning
Jim gets on the freeway—the same freeway, in fact, that he takes to work every weekday morning
Jim gets into the exit lane for his work, even though Bob’s house is still many exits away
Jim finds himself pulling onto the street his workplace is on
Jim mutters “whoops, autopilot” under his breath, pulls a u turn at the next light, and gets back on the freeway towards Bob’s house
This sequence of actions is pretty nonsensical from a goal-directed-behavior perspective, but is perfectly sensible if Jim’s behavior here is driven by contextual heuristics like “when it’s morning and I’m next to my work’s freeway offramp, I get off the freeway”.
Note that I’m not saying “humans never exhibit goal-directed behavior”.
Instead, I’m saying that “take a goal, and come up with a plan to achieve that goal, and execute that plan” is, itself, just one of the many contextually-activated behaviors humans exhibit.
I see no particular reason that an LLM couldn’t learn to figure out when it’s in a context like “the current context appears to be in the execute-the-next-step-of-the-plan stage of such-and-such goal-directed-behavior task”, and produce the appropriate output token for that context.
So I keep seeing takes about how to tell if LLMs are “really exhibiting goal-directed behavior” like a human or whether they are instead “just predicting the next token”. And, to me at least, this feels like a confused sort of question that misunderstands what humans are doing when they exhibit goal-directed behavior.
Concrete example. Let’s say we notice that Jim has just pushed the turn signal lever on the side of his steering wheel. Why did Jim do this?
The goal-directed-behavior story is as follows:
Jim pushed the turn signal lever because he wanted to alert surrounding drivers that he was moving right by one lane
Jim wanted to alert drivers that he was moving one lane right because he wanted to move his car one lane to the right.
Jim wanted to move his car one lane to the right in order to accomplish the goal of taking the next freeway offramp
Jim wanted to take the next freeway offramp because that was part of the most efficient route from his home to his workplace
Jim wanted to go to his workplace because his workplace pays him money
Jim wants money because money can be exchanged for goods and services
Jim wants goods and services because they get him things he terminally values like mates and food
But there’s an alternative story:
When in the context of “I am a middle-class adult”, the thing to do is “have a job”. Years ago, this context triggered Bob to perform the action “get a job”, and now he’s in the context of “having a job”.
When in the context of “having a job”, “showing up for work” is the expected behavior.
Earlier this morning, Bob had the context “it is a workday” and “I have a job”, which triggered Bob to begin the sequence of actions associated with the behavior “commuting to work”
Bob is currently approaching the exit for his work—with the context of “commuting to work”, this means the expected behavior is “get in the exit lane”, and now he’s in the context “switching one lane to the right”
In the context of “switching one lane to the right”, one of the early actions is “turn on the right turn signal by pushing the turn signal lever”. And that is what Bob is doing right now.
I think this latter framework captures some parts of human behavior that the goal-directed-behavior framework misses out on. For example, let’s say the following happens
Jim is going to see his good friend Bob on a Saturday morning
Jim gets on the freeway—the same freeway, in fact, that he takes to work every weekday morning
Jim gets into the exit lane for his work, even though Bob’s house is still many exits away
Jim finds himself pulling onto the street his workplace is on
Jim mutters “whoops, autopilot” under his breath, pulls a u turn at the next light, and gets back on the freeway towards Bob’s house
This sequence of actions is pretty nonsensical from a goal-directed-behavior perspective, but is perfectly sensible if Jim’s behavior here is driven by contextual heuristics like “when it’s morning and I’m next to my work’s freeway offramp, I get off the freeway”.
Note that I’m not saying “humans never exhibit goal-directed behavior”.
Instead, I’m saying that “take a goal, and come up with a plan to achieve that goal, and execute that plan” is, itself, just one of the many contextually-activated behaviors humans exhibit.
I see no particular reason that an LLM couldn’t learn to figure out when it’s in a context like “the current context appears to be in the execute-the-next-step-of-the-plan stage of such-and-such goal-directed-behavior task”, and produce the appropriate output token for that context.