Maybe we can draw a line between the score an AI gets without using human written problem/solution pairs in any way, and the score an AI gets after using them in some way (RL on example questions, training on example solutions, etc.).
In the former case, we’re interested in how well the AI can do a task as difficult as the test, all on its own. In the latter case, we’re interested in how well the AI can do a task as difficult as the test, if working with humans training it for the task.
I really want to make it clear I’m not trying to badmouth o3, I think it is a very impressive model. I should’ve written my post better.
Maybe we can draw a line between the score an AI gets without using human written problem/solution pairs in any way, and the score an AI gets after using them in some way (RL on example questions, training on example solutions, etc.).
In the former case, we’re interested in how well the AI can do a task as difficult as the test, all on its own. In the latter case, we’re interested in how well the AI can do a task as difficult as the test, if working with humans training it for the task.
I really want to make it clear I’m not trying to badmouth o3, I think it is a very impressive model. I should’ve written my post better.