Progress in ML looks a lot like, we had a different setup with different data and a tweaked algorithm and did better on this task. If you want to put an asterisk on o3 because it trained in some specific way that’s different from previous contenders, then basically every ML advance is going to have a similar asterisk. Seems like a lot of asterisking.
Maybe we can draw a line between the score an AI gets without using human written problem/solution pairs in any way, and the score an AI gets after using them in some way (RL on example questions, training on example solutions, etc.).
In the former case, we’re interested in how well the AI can do a task as difficult as the test, all on its own. In the latter case, we’re interested in how well the AI can do a task as difficult as the test, if working with humans training it for the task.
I really want to make it clear I’m not trying to badmouth o3, I think it is a very impressive model. I should’ve written my post better.
Progress in ML looks a lot like, we had a different setup with different data and a tweaked algorithm and did better on this task. If you want to put an asterisk on o3 because it trained in some specific way that’s different from previous contenders, then basically every ML advance is going to have a similar asterisk. Seems like a lot of asterisking.
Maybe we can draw a line between the score an AI gets without using human written problem/solution pairs in any way, and the score an AI gets after using them in some way (RL on example questions, training on example solutions, etc.).
In the former case, we’re interested in how well the AI can do a task as difficult as the test, all on its own. In the latter case, we’re interested in how well the AI can do a task as difficult as the test, if working with humans training it for the task.
I really want to make it clear I’m not trying to badmouth o3, I think it is a very impressive model. I should’ve written my post better.