It would mean that R1 is actually more efficient and therefore more advanced that o1, which is possible but not very plausible given its simple RL approach.
I think that is very plausible. I don’t think o1 or even r1 for that matter is anywhere near as efficient as LLMs can be. OpenAI is probably putting a lot more resources to get to AGI first, than to get to AGI efficiently. Deepseek v3 is already miles better than GPT 4o while being cheaper.
I think it’s more likely that o1 is similar to R1-Zero (rather than R1), that is, it may mix languages which doesn’t result in reasoning steps that can be straightforwardly read by humans. A quick inference time fix for this is to do another model call which translates the gibberish into readable English, which would explain the increased CoT time.
I think this is extremely unlikely. Here’s some questions that demonstrate why. Do you think OpenAI is using a model to first translate to English, and then another model to generate a summary? Is this conditional on showing the translated CoTs being a feature going forwards? If so, do you expect OpenAI to do this for all CoTs or just the CoTs they intend to show? If the latter, don’t you think there will be a significant difference in the “thinking” time between the responses where the translated CoT is visible and the responses where only the summary is visible?
I think that is very plausible. I don’t think o1 or even r1 for that matter is anywhere near as efficient as LLMs can be. OpenAI is probably putting a lot more resources to get to AGI first, than to get to AGI efficiently. Deepseek v3 is already miles better than GPT 4o while being cheaper.
I think this is extremely unlikely. Here’s some questions that demonstrate why.
Do you think OpenAI is using a model to first translate to English, and then another model to generate a summary? Is this conditional on showing the translated CoTs being a feature going forwards? If so, do you expect OpenAI to do this for all CoTs or just the CoTs they intend to show? If the latter, don’t you think there will be a significant difference in the “thinking” time between the responses where the translated CoT is visible and the responses where only the summary is visible?