I thought that I’ve had enough of xAI being likely 3 months behind the frontier, and now we get this… I tried to find out anything about Meta’s model and had Claude Opus 4.6 conclude that Meta’s model is also 3-4 months behind. There also is the issue of Meta having manipulated some benchmarks to present Llama 4 as more capable and with Meta’s claimed benchmark performance on the benchmarks ARC-AGI-2 and SWE-bench verified where the rivals’ models allegedly have different results than in the real leaderboards of ARC-AGI-2 and SWE-bench verified, likely because of a different method of elicitation. How do I lobby for a law change requiring EVERY new American model to be thoroughly evaluated by the entire Big Three?
I thought that I’ve had enough of xAI being likely 3 months behind the frontier, and now we get this… I tried to find out anything about Meta’s model and had Claude Opus 4.6 conclude that Meta’s model is also 3-4 months behind. There also is the issue of Meta having manipulated some benchmarks to present Llama 4 as more capable and with Meta’s claimed benchmark performance on the benchmarks ARC-AGI-2 and SWE-bench verified where the rivals’ models allegedly have different results than in the real leaderboards of ARC-AGI-2 and SWE-bench verified, likely because of a different method of elicitation. How do I lobby for a law change requiring EVERY new American model to be thoroughly evaluated by the entire Big Three?