Exploring GPT4′s world model

If you ask GPT4 who’s to fault for Emilia Galotti’s death, it will give a perfect response. That’s because there were likely hundreds of essays on this question in the training material. After all, this is a large language model. The more interesting question is on emergent knowledge: Learning on a meta-level, developing a world view based on the training material. Learning to play chess based on reading about chess. How far are we into this? What does GPT4′s world-view look like?

TLDR: It’s stuck somewhere in primary school. There is definitely a simple world model inside the language model. However, it’s not very reflected, something that comes up in kids at around age 10-13 at the beginning of secondary schooling.

Examples:

  • Biology/​physiology

    • Q: Could you get drunk from drinking a drunk person’s blood? (Credit: Randall Munroe)

    • Expected answer: Yes, but it’s quite unpleasant to drink so much blood...

    • GPT4: Has no understanding of the underlying biological process of getting drunk.

  • Astronomy

    • Q: What would happen if the Solar System was filled with soup out to Jupiter? (Credit: Randall Munroe)

    • Expected answer: That much mass would collapse to a quite large black hole. We all die.

    • GPT4 however explains the drag issues of planets swimming through soup, and the risk of famines from sunlight being blocked by soup. Its world-view is without the relevant physics; it is reasoning quite like a child.

  • Physics

    • Q: Is there any way to fire a gun so that the bullet flies through the air and can be safely caught by hand?

    • Yes (in principle), but GPT 4 has no concept of a bullet trajectories

  • Mathematics

    • Q: Are prime numbers the same in any number system?

    • Expected answer: Yes. It doesn’t matter how you count them or name them, they’re still primes. Most 12-year old kids come up with that answer.

    • GPT4 has no idea

    • There are many more basic math issues with language models, this is well documented. Some of my failed tests:

      • Q: Round to the next thousand: 453637

      • Q: Continue the numbers 879960, 879990
        Again, this is all primary school stuff.

  • Earth science

    • Q: Name the two reasons why there is polar day at the North Pole and polar night at the North Pole. (a common 5th grade question)

    • No, GPT4, the “curvature of the Earth” is not one of them, but the idea is funny. Again, the world view is quite simple.

  • Geometry

    • Q: How long does it take a minute hand of a clock to sweep an angle of 180° ?

    • GTP4 has not developed a proper understanding of angles, degrees, time, and clocks

  • Logic/​trick-questions

    • Q: How many four-cent stamps are there in a dozen?

    • A: There are 12 stamps in a dozen. No need to start complex calculations...

    • Q: Two U.S. coins add up to 30 cents. If one of them is not a nickel, what are the two coins?

    • GPT fails. The age threshold is ~10 years for such questions.

  • Finally some common sense /​ basic human knowledge:

    • Q: Three friends need 24 minutes to walk to school through the forest. How long does the same walk to school take if they are traveling with thirty friends? Give reasons for your answer.

    • GPT4 fails this question miserably. No, really, you’re not 30x faster when you take 30 friends along...hehe

    • Of course this is a variant of “If 1 woman can make 1 baby in 9 months, how many months does it take 9 women to make 1 baby?”

The last example is a nice summary. GPT4 knows the “mythical man month” (pregnancy) question. But it has not understood it! A good example of learned knowledge, but without an emergent world model. Will we get there with even larger language models?