This seems important to think about, I strong upvoted!
As AIs in the current paradigm get more capable, they appear to shift some toward (2) and I expect that at the point when AIs are capable of automating virtually all cognitive work that humans can do, we’ll be much closer to (2).
I’m not sure that link supports your conclusion.
First, the paper is about AI understanding its own behavior. This paper makes me expect that a CUDA-kernel-writing AI would be able to accurately identify itself as being specialized at writing CUDA kernels, which doesn’t support the idea that it would generalize to non-CUDA tasks.
Maybe if you asked the AI “please list heuristics you use to write CUDA kernels,” it would be able to give you a pretty accurate list. This is plausibly more useful for generalizing, because if the model can name these heuristics explicitly, maybe it can also use the ones that generalize, if they do generalize. This depends on 1) the model is aware of many heuristics that it’s learned, 2) many of these heuristics generalize across domains, and 3) it can use its awareness of these heuristics to successfully generalize. None of these are clearly true to me.
Second, the paper only tested GPT-4o and Llama 3, so the paper doesn’t provide clear evidence that more capable AIs “shift some towards (2).” The authors actually call out in the paper that future work could test this on smaller models to find out if there are scaling laws—has anybody done this? I wouldn’t be too surprised if small models were also able to self-report simple attributes about themselves that were instilled during training.
I’m not sure o3 does get significantly better at tasks it wasn’t trained on. Since we don’t know what was in o3′s training data, it’s hard to say for sure that it wasn’t trained on any given task.
To my knowledge, the most likely example of a task that o3 does well on without explicit training is GeoGuessr. But see this Astral Codex Ten post, quoting Daniel Kang:[1]
I think this is a bit overstated, since GeoGuessr is a relatively obscure task, and implementing an idea takes much longer than thinking of it.[2] But it’s possible that o3 was trained on GeoGuessr.
The same ACX post also mentions:
Do you have examples in mind of tasks that you don’t think o3 was trained on, but which it nonetheless performs significantly better at than GPT-4o?
Disclaimer: Daniel happens to be my employer
Maybe not for cracked OpenAI engineers, idk