Coding, math, whatever. Can LLMs predict the outcomes of physical experiments?
Suppose I pour 8 oz (226.8 g) of boiling water into a ceramic coffee mug that weighs 1.25 lb (0.57 kg). The ambient air is still and 20 degrees Celsius. The cup starts at room temperature. Give me an equation for the temperature of the water in Celsius over time. The only free variable in the equation should be the number of seconds t since the water was poured. Focus on accuracy during the first 5 minutes.
Does that seem hard? I think it’s hard. The relevant physical phenomena include at least:
Conduction of heat between the water, the mug, the air, and the table.
Conduction of heat inside each of those things.
Convection (fluid movement) inside the water and the air.
Evaporation cooling as water molecules become vapor.
Movement of water vapor in the air.
Radiation. (Like all matter, the mug and water emit temperature-dependent infrared radiation.)
Surface tension, thermal expansion/contraction, re-absorption of air into the water as it cools, probably more.
And many details aren’t specified in the prompt. Is the mug made of porcelain or stoneware? What is the mug’s shape? What is the table made of? How humid is the air? How am I reducing the spatially varying water temperature to a single number?
So this isn’t a problem where you can sit around and think and find
with a “correct” answer that you can find by thinking. Reality is too complicated. Instead, answering question requires “taste”—guessing which factors are most important, making assumptions about missing details, etc.
I liked dynomight’s “temperature over time of boiling water poured into a ceramic coffee mug” as a low-budget DIY test of research taste, so it goes into the list above. Opus 4.6 did best and cost $0.61:
More detail: