You mean something like using libraries of 3D models, maybe some kind of generative grammar of how to arrange them, and renderer feedback for what’s actually visible to produce (geometry description, realistic rendering) pairs to train visuospatial understanding on?
You mean something like using libraries of 3D models, maybe some kind of generative grammar of how to arrange them, and renderer feedback for what’s actually visible to produce (geometry description, realistic rendering) pairs to train visuospatial understanding on?