...and this is because GPT-n and human brains making snap judgements are both doing the same sort of thing.
I could very easily be wrong about that! But it does suggest some testable hypotheses, in the form of “find some process for which generates a somewhat predictable sequence, train both a human and a transformer to predict that sequence, and see if they make the same types of errors or completely different types of errors”.
Suppose for concreteness, on a specific problem (e.g. Python interpreter transcript prediction), GPT-3 makes mistakes that look like humans-making-snap-judgement mistakes, and then GPT-4 gets the answer right all the time. Or, suppose GPT-5 starts playing chess like a non-drunk grandmaster.
Would that result imply that the kind of cognition performed by GPT-3 is fundamentally, qualitatively different from that performed by GPT-4? Similarly for GPT-4 → GPT-5.
It seems more likely to me that each model performs some kind of non-human-like cognition at a higher level of performance (though possibly each iteration of the model is qualitatively different from previous versions). And I’m not sure there’s any experiment which involves only interpreting and comparing output errors without investigating the underlying mechanisms which produced them (e.g. through mechanistic interpretability) which would convince me otherwise. But it’s an interesting idea, and I think experiments like this could definitely tell us something.
(Also, thanks for clarifying and expanding on your original comment!)
Suppose for concreteness, on a specific problem (e.g. Python interpreter transcript prediction), GPT-3 makes mistakes that look like humans-making-snap-judgement mistakes, and then GPT-4 gets the answer right all the time. Or, suppose GPT-5 starts playing chess like a non-drunk grandmaster.
Would that result imply that the kind of cognition performed by GPT-3 is fundamentally, qualitatively different from that performed by GPT-4? Similarly for GPT-4 → GPT-5.
In the case of the Python interpreter transcript prediction task, I think if GPT-4 gets the answer right all the time that would indeed imply that GPT-4 is doing something qualitatively different than GPT-3. I don’t think it’s actually possible to get anywhere near 100% accuracy on that task without either having access to, or being, a Python interpreter.
Likewise, in the chess example, I expect that if GPT-5 is better at chess than GPT-4, that will look like “an inattentive and drunk super-grandmaster, with absolutely incredible intuition about the relative strength of board-states, but difficulty with stuff like combinations (but possibly with the ability to steer the game-state away from the board states it has trouble with, if it knows it has trouble in those sorts of situations)”. If it makes the sorts of moves that human grandmasters play when they are playing deliberately, and the resulting play is about as strong as those grandmasters, I think that would show a qualitatively new capability.
Also, my model isn’t “GPT’s cognition is human-like”. It is “GPT is doing the same sort of thing humans do when they make intuitive snap judgements”. In many cases it is doing that thing far far better than any human can. If GPT-5 comes out, and it can natively do tasks like debugging a new complex system by developing and using a gears-level model of that system, I think that would falsify my model.
Also also it’s important to remember that “GPT-5 won’t be able to do that sort of thing natively” does not mean “and therefore there is no way for it to do that sort of thing, given that it has access to tools”. One obvious way for GPT-4 to succeed at the “predict the output of running Python code” is to give it the ability to execute Python code and read the output. The system of “GPT-4 + Python interpreter” does indeed perform a fundamentally, qualitatively different type of cognition that “GPT-4 alone”. But “it requires a fundamentally different type of cognition” does not actually mean “the task is not achievable by known means”.
Also also also.,I mostly care about this model because it suggests interesting things to do on the mechanistic interpretability front. Which I am currently in the process of learning how to do. My personal suspicion is that the bags of tensors are not actually inscrutable, and that looking at these kinds of mistakes would make some of the failure modes of transformers no-longer-mysterious.
Suppose for concreteness, on a specific problem (e.g. Python interpreter transcript prediction), GPT-3 makes mistakes that look like humans-making-snap-judgement mistakes, and then GPT-4 gets the answer right all the time. Or, suppose GPT-5 starts playing chess like a non-drunk grandmaster.
Would that result imply that the kind of cognition performed by GPT-3 is fundamentally, qualitatively different from that performed by GPT-4? Similarly for GPT-4 → GPT-5.
It seems more likely to me that each model performs some kind of non-human-like cognition at a higher level of performance (though possibly each iteration of the model is qualitatively different from previous versions). And I’m not sure there’s any experiment which involves only interpreting and comparing output errors without investigating the underlying mechanisms which produced them (e.g. through mechanistic interpretability) which would convince me otherwise. But it’s an interesting idea, and I think experiments like this could definitely tell us something.
(Also, thanks for clarifying and expanding on your original comment!)
In the case of the Python interpreter transcript prediction task, I think if GPT-4 gets the answer right all the time that would indeed imply that GPT-4 is doing something qualitatively different than GPT-3. I don’t think it’s actually possible to get anywhere near 100% accuracy on that task without either having access to, or being, a Python interpreter.
Likewise, in the chess example, I expect that if GPT-5 is better at chess than GPT-4, that will look like “an inattentive and drunk super-grandmaster, with absolutely incredible intuition about the relative strength of board-states, but difficulty with stuff like combinations (but possibly with the ability to steer the game-state away from the board states it has trouble with, if it knows it has trouble in those sorts of situations)”. If it makes the sorts of moves that human grandmasters play when they are playing deliberately, and the resulting play is about as strong as those grandmasters, I think that would show a qualitatively new capability.
Also, my model isn’t “GPT’s cognition is human-like”. It is “GPT is doing the same sort of thing humans do when they make intuitive snap judgements”. In many cases it is doing that thing far far better than any human can. If GPT-5 comes out, and it can natively do tasks like debugging a new complex system by developing and using a gears-level model of that system, I think that would falsify my model.
Also also it’s important to remember that “GPT-5 won’t be able to do that sort of thing natively” does not mean “and therefore there is no way for it to do that sort of thing, given that it has access to tools”. One obvious way for GPT-4 to succeed at the “predict the output of running Python code” is to give it the ability to execute Python code and read the output. The system of “GPT-4 + Python interpreter” does indeed perform a fundamentally, qualitatively different type of cognition that “GPT-4 alone”. But “it requires a fundamentally different type of cognition” does not actually mean “the task is not achievable by known means”.
Also also also.,I mostly care about this model because it suggests interesting things to do on the mechanistic interpretability front. Which I am currently in the process of learning how to do. My personal suspicion is that the bags of tensors are not actually inscrutable, and that looking at these kinds of mistakes would make some of the failure modes of transformers no-longer-mysterious.