Actually, things that are effectively prediction markets—options, futures and other “derivative” contracts—are entirely mainstream for larger businesses (huge amounts of money are involved). It is quite easy and common to bet on the price of oil by purchasing an option to buy it at some future time, for example.
The only thing that isn’t mainstream are the things labeled “prediction markets” and that is because the focus on questions people are curious about rather than things that a lot of money rides on (like oil prices or interest rates).
I believe that Marcus’ point is that there are classes of problems that tend to be hard for LLMs (biological reasoning, physical reasoning, social reasoning, practical reasoning, object and individual tracking, nonsequiturs). The argument is that problems in these class will continue to hard. [1]
But I think there’s a larger issue. A lot of the discussion involve hostility to a given critic of AI “moving the goal posts”. As described, Model X(1) is introduced, critic notices limitation L(1), Model X(2) addresses and critics says they’re unconvinced and notes limitation L(2) and so-on. The critic of these critics says this approach is unfair, a bad argument, etc.
However, what the “moving the goal posts” objection misses, in my opinion, is the context of the claim that’s being made when someone says X(n) is generally intelligent. This claim isn’t about giving the creator of a model credit or an award. The claim is about whether a thing has a flexibility akin to that of a human being (especially the flexible, robust goal seeking ability of a human, an ability that could make a thing dangerous) and we don’t actually have a clear, exact formulation of what the flexible intelligence of a human consists of. The Turing Test might not be the best AGI test but it’s put in an open-ended fashion because there’s no codified set of “prove you’re like a human” questions.
Which is to say, Gary Marcus aside, if models keep advancing and if people keep finding new capacities that each model lacks, it will be perfectly reasonable to put the situation as “it’s not AGI yet” as long as these capacities are clearly significant capacities of human intelligence. There wouldn’t even need to be a set pattern to capacities critics cited. Again, it’s not about argument fairness etc, it’s that this sort of thing is all we have, for now, as a test of AGI.
[1 ]https://garymarcus.substack.com/p/what-does-it-mean-when-an-ai-fails