Claude 4 feels pretty weak compared to what I’d think Claude 4 would have been a year away. It makes little progress on most benchmarks with a lot of tricks in them to exaggerate performance. Gemini 2.5 pro feels a bit stronger but not that much stronger. (It feels stronger since they didn’t call it Gemini 3, not because it’s particularly stronger than Claude)
Current methods have definitely hit a wall but AGI simultaneously feels pretty close. Strange timeline to be in. I predict progress will be a jump after the next breakthrough.
There is a ~2000x scaleup between 2022 and ~2028 (since demonstration of ChatGPT started driving scaling at more serious levels of funding), from 2e25 FLOPs models to ~5e28 FLOPs models (at which point it dramatically slows down). Current frontier models are trained on 2024 compute (~100K H100s), which enables 3e26 FLOPs models (or possibly 6e26 FLOPs in FP8). This is only a third of the way from the original Mar 2023 GPT-4 on logarithmic scale.
So perhaps subjectively current progress is less than some expectations, but it’s not at the end of a road in the short term. Being slow is distinct from slowing down (“hitting a wall”).
I feel the same of both but I will say Gemini feels like it is better at not gassing me up when I ask for feedback. On the other hand, It is the only model I’ve had that fundementaly did not understand a question I asked. It has done that twice now, once on the previous version and the other on the most recent.
Claude 4 feels pretty weak compared to what I’d think Claude 4 would have been a year away. It makes little progress on most benchmarks with a lot of tricks in them to exaggerate performance. Gemini 2.5 pro feels a bit stronger but not that much stronger. (It feels stronger since they didn’t call it Gemini 3, not because it’s particularly stronger than Claude)
Current methods have definitely hit a wall but AGI simultaneously feels pretty close. Strange timeline to be in. I predict progress will be a jump after the next breakthrough.
There is a ~2000x scaleup between 2022 and ~2028 (since demonstration of ChatGPT started driving scaling at more serious levels of funding), from 2e25 FLOPs models to ~5e28 FLOPs models (at which point it dramatically slows down). Current frontier models are trained on 2024 compute (~100K H100s), which enables 3e26 FLOPs models (or possibly 6e26 FLOPs in FP8). This is only a third of the way from the original Mar 2023 GPT-4 on logarithmic scale.
So perhaps subjectively current progress is less than some expectations, but it’s not at the end of a road in the short term. Being slow is distinct from slowing down (“hitting a wall”).
I feel the same of both but I will say Gemini feels like it is better at not gassing me up when I ask for feedback. On the other hand, It is the only model I’ve had that fundementaly did not understand a question I asked. It has done that twice now, once on the previous version and the other on the most recent.
1st comment, hi mods <3