Indeed I heard that neural networks intrinsically can’t infer these kinds of complex mathematical transformations; also that, as their output is a superposition of their training data (existing photographs), they can’t create arbitrary scenes either.
Citation very much needed. It sure seems to me like image models have an understanding of 3d space e.g. asking vision models to style transfer to depth/normal maps shouldn’t work if they don’t understand 3d space
Citation very much needed. It sure seems to me like image models have an understanding of 3d space e.g. asking vision models to style transfer to depth/normal maps shouldn’t work if they don’t understand 3d space