Has the Symbol Grounding Problem just gone away?

A few years ago, the symbol grounding problem was widely considered a significant challenge in AI discussions. I believed that it would likely be addressed as a side effect of capability improvements, without requiring specific breakthroughs or attention, unlike others who considered it fundamental. However, like many, I didn’t anticipate the extent to which GPT-4 has demonstrated this to be true. Asserting that such capabilities could be achieved with a text-only training set at that time would have seemed like a parody of my position.
Had you asked me how a model like GPT-4 would acquire its capabilities, I would have suggested a process more akin to how children learn. It might have started with a predictive model of physical reality, established the concept of an object, learned object permanence, and then acquired simple words as they applied to previously learned concepts.
Despite its capabilities, GPT-4 still seems to lack robust physical intuition, evident in data science tasks and mathematical understanding related to the 3D world. Will we see a model trained from scratch, as described earlier? For instance, the Meta AI model appears to grasp the concept of an object in 2D. Suppose this understanding is fully extended to 3D and designated as our foundational model. Can such a model be trained on language, particularly that which relates to the physical world, and develop physical intuition as good as a human’s?

Grounding overhang and interpretability implications:

Would such a model be much better at mathematical and programming tasks for given model resources? Assuming the foundational model is much smaller than GPT-4 it seems reasonable that it could gain similar or greater mathematical and programming skills while still having a smaller model size even when trained on a enough language concepts to be capable at those tasks.

This could also help with interpretability as the foundational model couldn’t really be thought to lie or deceive as it is just modelling objects in 3d. Deception and theory of mind abilities could be observed as they became available.