Given how strong GPT-5.4 is while likely being Sonnet sized
Judging from my own experience and what I’ve read of other people’s experiences on Reddit, GPT-5.3 and 5.4 are very similar to Opus 4.6 in coding ability, so if they’re actually Sonnet sized… (which seems pretty plausible given the API token costs)
OpenAI’s current failure to have a strong offering similar to Claude Code
I’ve seen many posts/comments with people saying that they actually prefer Codex to Claude Code (comparable or maybe even more than the reverse). See this thread for some examples. Quoting one from this thread below:
I have been using Codex AND Claude side by side for the same project*, with the same prompts.
Codex has been consistently better on almost every level.
* (an open source framework for 2D games in Godot 4.6 GDScript, mostly using AI to review existing code)
I’ve also seen those comments but I’m worried that a bunch of them might be bots. AI-powered astroturfing seems to be a thing already in political landscape and OpenAI specifically seems to be behind some of it; I wouldn’t put it past them to pay for fake reviews like this.
For what it’s worth I’ve personally found that Claude Code with Opus 4.6 and Codex with GPT 5.4 are very similar products. I haven’t done a very deep dive, but I’ve used them side by side for a few projects. Certainly the difference between them feels much smaller than the difference from models that are a few months older.
Yeah, that’s what I hear from people I actually know. I think people may have been under the mistaken impression that I think Claude is significantly better than Codex at coding? I never said that.
(Wei Dai was suggesting that current sentiment for Claude Code and Codex seemed to be comparable, in response to Vladimir Nesov mentioning “OpenAI’s current failure to have a strong offering similar to Claude Code.”)
Yeah, I guess I was under that impression, since if Claude is similar to Codex at coding, while being a larger, more expensive model (which seems likely based on API costs and Nesov’s analysis of their training hardware), then Anthropic has no clear lead (or would be behind if not for Mythos). So I thought your claim of their lead was partly based on an impression (similar to Nesov’s) that Claude is significantly better than Codex at coding.
(And I think a lot of people were probably under this misimpression at some point, including me, due to seeing a lot of talk about Claude Code on X around December, much more than about Codex, which in retrospect I have to attribute to a successful Anthropic marketing campaign.)
Also relevant that in recent months, Anthropic gave huge subscription subsidies to gain hype and mindshare ($5000 per month worth of tokens for a $200 subscription according to one report and other Reddit analysis I’ve seen), and probably also to temporarily paper over their higher inference costs relative to GPT (for similar coding ability). So I think you may have in part fallen for a highly successful marketing campaign, but on Anthropic’s part, not OpenAI’s.
(I think OpenAI also gave and is giving large subsidies, but not as large as Anthropic’s because their inference costs/pricing are lower to begin with.)
Fwiw, I don’t think that $5,000 figure is meaningful—the quote is saying (iiuc) that someone using Claude Code could use the equivalent of $5,000 in API credits, if they used their full usage allocation for each time period. But a very high fraction of users aren’t using the maximum possible tokens on these plans. When people were taking advantage of the plan pricing in 2025 to get max token use, Anthropic adjusted the allowances and presumably they’re now comfortable with overall Claude Code token spend on different plans. The $5,000 figure may be from before that adjustment, and only applied to the extreme right tail of token users even at the time afaik.
In general I would expect trends like Claude Code/Codex adoption to be driven more organically than by marketing campaigns, since growth has been so consistent and on such a large scale. I don’t think you get revenue 10xing every year from a creative marketing strategy.
That seems unlikely, as it would be leaked or detected pretty easily. I.e., they have to either pay existing users to post fake reviews, in which case someone would have leaked about being paid for this, or create a bunch of new accounts which someone would have noticed and commented on.
I disagree, fake reviews are incredibly common on the internet—something like 30% apparently.I’ve also specifically seen at least one example of a Substack blogger talking about the importance of building datacenters in their rural district, who was clearly AI generated.
I think the realistic assumption is that many people state this because it goes against the current vibe that Claude is better. Those that prefer Claude do not feel the need to belabour the obvious.
My own experience is Opus > GPT-5.4 > Sonnet but Claude seems a lot better at data analysis and GPT-5.4 probably has its own areas of relative dominance.
My experience is that the Claudiness dimension discrepancy still exists, altho has shrunk. And also Claude has better personality. Which pretty much means
Claude Code:
Better at Agentic Tool use
Better at “SWE” stuff, and getting already well-specced programs to work
No, I expect these comments to be mostly written either by subscription users, or those who are paying public API prices. I’ve spent a significant amount of time with both products, and would recommend picking Codex with GPT 5.4 if I was limited to only spending $20/month, especially since there are regular rate limit resets. Claiming that these reviews are faked without providing strong evidence seems disingenuous to me, the harnesses really are not where they were in December.
I didn’t claim they were faked without strong evidence, I said I was worried that a bunch of them might be.
Anyhow, according to a brief Claude search, fake reviews are incredibly common. E.g. https://capitaloneshopping.com/research/fake-review-statistics/ says that an average of 30% of reviews are fake. So yeah, numerous companies must be indulging in this practice. And better AI makes it easier.
I think it’s a totally reasonable hypothesis to entertain, which is why I try to ask people I actually know about things like this (who tell me Codex is a bit better in some ways, Claude in others, overall similar) rather than trusting anonymous internet comments.
Platform / reputation lock-in is going to be a substantial factor, here, especially as AI grows in prominence and people start to emotionally or tribally ‘identify’ with brands. While I have many complaints about OpenAI, canning 4o and the marketing approach it represented was, in retrospect, a significant sacrifice in pursuit of the common good.
I’m not a heavy user of AI coding, but I’d expect that Codex and Gemini would do okay on the software engineering / RSE tests that Claude’s been put through, based on my experience testing them against hard engineering problems and their benchmark performance. A substantial share of Anthropic’s ‘vibes’ advantage right now comes from the fact that they’ve been more effective in building the kind of infrastructure that people want for these kinds of tasks, rather than anything directly tied to their LLM’s abilities. For example, I set up Claude for autoresearchthe other day to test it out, and doing so was a very quick, very seamless experience with lots of online references.
Judging from my own experience and what I’ve read of other people’s experiences on Reddit, GPT-5.3 and 5.4 are very similar to Opus 4.6 in coding ability, so if they’re actually Sonnet sized… (which seems pretty plausible given the API token costs)
I’ve seen many posts/comments with people saying that they actually prefer Codex to Claude Code (comparable or maybe even more than the reverse). See this thread for some examples. Quoting one from this thread below:
I’ve also seen those comments but I’m worried that a bunch of them might be bots. AI-powered astroturfing seems to be a thing already in political landscape and OpenAI specifically seems to be behind some of it; I wouldn’t put it past them to pay for fake reviews like this.
For what it’s worth I’ve personally found that Claude Code with Opus 4.6 and Codex with GPT 5.4 are very similar products. I haven’t done a very deep dive, but I’ve used them side by side for a few projects. Certainly the difference between them feels much smaller than the difference from models that are a few months older.
Yeah, that’s what I hear from people I actually know. I think people may have been under the mistaken impression that I think Claude is significantly better than Codex at coding? I never said that.
(Wei Dai was suggesting that current sentiment for Claude Code and Codex seemed to be comparable, in response to Vladimir Nesov mentioning “OpenAI’s current failure to have a strong offering similar to Claude Code.”)
Yeah I was replying narrowly to the point about Reddit comment threads. Perhaps I should have disclaimed.
Yeah, I guess I was under that impression, since if Claude is similar to Codex at coding, while being a larger, more expensive model (which seems likely based on API costs and Nesov’s analysis of their training hardware), then Anthropic has no clear lead (or would be behind if not for Mythos). So I thought your claim of their lead was partly based on an impression (similar to Nesov’s) that Claude is significantly better than Codex at coding.
(And I think a lot of people were probably under this misimpression at some point, including me, due to seeing a lot of talk about Claude Code on X around December, much more than about Codex, which in retrospect I have to attribute to a successful Anthropic marketing campaign.)
Also relevant that in recent months, Anthropic gave huge subscription subsidies to gain hype and mindshare ($5000 per month worth of tokens for a $200 subscription according to one report and other Reddit analysis I’ve seen), and probably also to temporarily paper over their higher inference costs relative to GPT (for similar coding ability). So I think you may have in part fallen for a highly successful marketing campaign, but on Anthropic’s part, not OpenAI’s.
(I think OpenAI also gave and is giving large subsidies, but not as large as Anthropic’s because their inference costs/pricing are lower to begin with.)
Fwiw, I don’t think that $5,000 figure is meaningful—the quote is saying (iiuc) that someone using Claude Code could use the equivalent of $5,000 in API credits, if they used their full usage allocation for each time period. But a very high fraction of users aren’t using the maximum possible tokens on these plans. When people were taking advantage of the plan pricing in 2025 to get max token use, Anthropic adjusted the allowances and presumably they’re now comfortable with overall Claude Code token spend on different plans. The $5,000 figure may be from before that adjustment, and only applied to the extreme right tail of token users even at the time afaik.
In general I would expect trends like Claude Code/Codex adoption to be driven more organically than by marketing campaigns, since growth has been so consistent and on such a large scale. I don’t think you get revenue 10xing every year from a creative marketing strategy.
OpenAI did use to have little limits for Codex a few months ago as well but recently increased the limits.
I think it’s plausible that it’s more about gaining training data than hype or mindshare.
That seems unlikely, as it would be leaked or detected pretty easily. I.e., they have to either pay existing users to post fake reviews, in which case someone would have leaked about being paid for this, or create a bunch of new accounts which someone would have noticed and commented on.
I disagree, fake reviews are incredibly common on the internet—something like 30% apparently. I’ve also specifically seen at least one example of a Substack blogger talking about the importance of building datacenters in their rural district, who was clearly AI generated.
I think the realistic assumption is that many people state this because it goes against the current vibe that Claude is better. Those that prefer Claude do not feel the need to belabour the obvious.
My own experience is Opus > GPT-5.4 > Sonnet but Claude seems a lot better at data analysis and GPT-5.4 probably has its own areas of relative dominance.
My experience is that the Claudiness dimension discrepancy still exists, altho has shrunk. And also Claude has better personality. Which pretty much means
Claude Code:
Better at Agentic Tool use
Better at “SWE” stuff, and getting already well-specced programs to work
Better at philosophical and judgement based stuff
Codex / OpenAI Models:
Higher raw intelligence
Better at math
Better at being precise
More clunky
No, I expect these comments to be mostly written either by subscription users, or those who are paying public API prices. I’ve spent a significant amount of time with both products, and would recommend picking Codex with GPT 5.4 if I was limited to only spending $20/month, especially since there are regular rate limit resets. Claiming that these reviews are faked without providing strong evidence seems disingenuous to me, the harnesses really are not where they were in December.
I didn’t claim they were faked without strong evidence, I said I was worried that a bunch of them might be.
Anyhow, according to a brief Claude search, fake reviews are incredibly common. E.g. https://capitaloneshopping.com/research/fake-review-statistics/ says that an average of 30% of reviews are fake. So yeah, numerous companies must be indulging in this practice. And better AI makes it easier.
See also https://x.com/TheMidasProj/status/2041614395583664225 which seems to be OpenAI-linked, and https://doublespeed.ai/.
I think it’s a totally reasonable hypothesis to entertain, which is why I try to ask people I actually know about things like this (who tell me Codex is a bit better in some ways, Claude in others, overall similar) rather than trusting anonymous internet comments.
Platform / reputation lock-in is going to be a substantial factor, here, especially as AI grows in prominence and people start to emotionally or tribally ‘identify’ with brands. While I have many complaints about OpenAI, canning 4o and the marketing approach it represented was, in retrospect, a significant sacrifice in pursuit of the common good.
I’m not a heavy user of AI coding, but I’d expect that Codex and Gemini would do okay on the software engineering / RSE tests that Claude’s been put through, based on my experience testing them against hard engineering problems and their benchmark performance. A substantial share of Anthropic’s ‘vibes’ advantage right now comes from the fact that they’ve been more effective in building the kind of infrastructure that people want for these kinds of tasks, rather than anything directly tied to their LLM’s abilities. For example, I set up Claude for autoresearch the other day to test it out, and doing so was a very quick, very seamless experience with lots of online references.