Ignoring the AGI question (which I don’t think your post is implying), I think this depends on whether we’re counting success as having the best model or having a successful business. The latter they seem to be only extending their lead so far, from what I can tell.
I thought they were in trouble last year, as Anthropic had the clearly-superior model for so long. Yet normal people didn’t care at all, and still barely know the words “Claude” or “Gemini”.
OpenAI are executing very well on the consumer product side of things^1, and from what I can tell that’s the side that actually matters to non-enthusiasts. Non-enthusiasts don’t seem to push the models enough to notice the difference between the SOTA ones, so a “slightly better” model isn’t enough to switch^2.
OpenAI also seem to be taking the bet (judging from Sam’s interview on Stratechery) that features such as memory will create lock-in of users. Users wont want to switch to another bot that doesn’t know them and their history very well.
I agree that the biggest risk is integration with existing tools becoming good enough that people don’t install a separate app — Microsoft will probably own the business market once they integrate well enough that workers stop manually copying private data into ChatGPT. Though they’ll likely still do so for the more “personal questions” about work. Probably the biggest risk for OpenAI’s consumer-focus is closed platforms like Apple’s pushing their own more-convenient AI, if they ever do. Or Meta in WhatsApp-dominant countries. They could use the built-in knowledge of you via your messages, etc, going back many years, though that’s sometimes seen as more invasive than a bot that you told the information to yourself.
^1. Consumer features they lead: basic app quality/speed and convenience, memory, voice mode, image generation, customisability, deep research (execution and readability compared to the latest gemini pro), working “everywhere”.
^2. Also see chatbot arena, where scores don’t seem great at distinguishing intelligence beyond homework problems, and llama was able to “game” by praising the user more. Yet I would expect arena judges to be more enthusiast than the typical consumer.
Ignoring the AGI question (which I don’t think your post is implying), I think this depends on whether we’re counting success as having the best model or having a successful business. The latter they seem to be only extending their lead so far, from what I can tell.
I thought they were in trouble last year, as Anthropic had the clearly-superior model for so long. Yet normal people didn’t care at all, and still barely know the words “Claude” or “Gemini”.
OpenAI are executing very well on the consumer product side of things^1, and from what I can tell that’s the side that actually matters to non-enthusiasts. Non-enthusiasts don’t seem to push the models enough to notice the difference between the SOTA ones, so a “slightly better” model isn’t enough to switch^2.
OpenAI also seem to be taking the bet (judging from Sam’s interview on Stratechery) that features such as memory will create lock-in of users. Users wont want to switch to another bot that doesn’t know them and their history very well.
I agree that the biggest risk is integration with existing tools becoming good enough that people don’t install a separate app — Microsoft will probably own the business market once they integrate well enough that workers stop manually copying private data into ChatGPT. Though they’ll likely still do so for the more “personal questions” about work. Probably the biggest risk for OpenAI’s consumer-focus is closed platforms like Apple’s pushing their own more-convenient AI, if they ever do. Or Meta in WhatsApp-dominant countries. They could use the built-in knowledge of you via your messages, etc, going back many years, though that’s sometimes seen as more invasive than a bot that you told the information to yourself.
^1. Consumer features they lead: basic app quality/speed and convenience, memory, voice mode, image generation, customisability, deep research (execution and readability compared to the latest gemini pro), working “everywhere”.
^2. Also see chatbot arena, where scores don’t seem great at distinguishing intelligence beyond homework problems, and llama was able to “game” by praising the user more. Yet I would expect arena judges to be more enthusiast than the typical consumer.