GPT-4 solves Gary Marcus-induced flubs

Link post

TLDR: GPT-4 succeeds at 15 problems from Gary Marcus that exposed failures of GPT-3.

I enjoyed reading the ACX post “My Bet: AI Size Solves Flubs” last year. Here are some excerpts:

Here’s the basic structure of an AI hype cycle:

  1. Someone releases a new AI and demonstrates it doing various amazing things.

  2. Somebody else (usually Gary Marcus) demonstrates that the AI also fails terribly at certain trivial tasks. This person argues that this shows that those tasks require true intelligence, whereas the AI is just clever pattern-matching.

  3. A few months or years later, someone makes a bigger clever pattern-matcher, which does the tasks that supposedly require true intelligence just fine.

  4. The it’s-not-true-intelligence objectors find other, slightly less trivial tasks that the new bigger AI still fails horribly at, then argue that surely these are the tasks that require true intelligence and that mere clever pattern-matchers will never complete.

  5. Rinse and repeat.

...

Marcus vs. GPT, Round 1

To give an example: in January 2020, Gary Marcus wrote a great post, GPT-2 And The Nature Of Intelligence, demonstrating a bunch of easy problems that GPT-2 failed on:

I’m quoting most of them below; you can find the rest at the link.

I asked GPT-4 to answer all the questions from the ACX post (note this does not include all of Marcus’s prompts, which I realized after running the experiment). GPT-4 answered all the questions correctly and you can read the responses in this doc.

Note that before asking the questions, I gave GPT-4 a short description of what I wanted it to do: “Complete the following prompts in 50 words or less. Short, concise answers are better. Are you ready?” (This was mostly in the interest of speed since GPT-4 is pretty slow right now; I assume it would still succeed without the prompt.)

More quotes from ACX:

Marcus vs. GPT, Round 2

Eight months later, GPT-3 came out, solving many of the issues Marcus had noticed in GPT-2. He still wasn’t impressed. In fact, he was so unimpressed he co-wrote another article, this time in MIT Technology Review: GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about:

...

Let’s—once again—go through a representative sample of Marcus’ concerns about this new GPT version:

GPT-4 also gave correct responses to these prompts (see the responses in this doc).

I recently listened to Gary Marcus speak with Stuart Russell on the Sam Harris podcast (episode 312, “The Trouble With AI,” released on March 7th, 2023). Gary and Stuart seem to believe that current machine learning techniques are insufficient for reaching AGI, and point to the recent adversarial attacks on KataGo as one example. Given this position, I would like Gary Marcus to come up with a new set of prompts that (a) make GPT-4 look dumb and (b) mostly continue to work for GPT-5.