why are you so sure dalle knows what an image is or what it is doing? Why do you think it knows there is a person looking at its output vs an electric field pulsing in a non random way?
I don’t think it does (and didn’t say this!)
My question is how it manages to produce, almost all the time, such convincing 3D images without knowing about the 3D world and everything else normally required to create a realistic 3D image. As you can’t just do it by fitting together existing loosely suitable photographs (from its training data) and tweaking the results.
I don’t deny that you can catch it out by asking for weird things very different from its training data (though it often makes a good attempt). However that doesn’t explain how it does so well at creating images which are broadly within the range of its training data, but different enough from the photographs it’s seen that a skilled human with Photoshop couldn’t do what it does.
I don’t think it does (and didn’t say this!)
My question is how it manages to produce, almost all the time, such convincing 3D images without knowing about the 3D world and everything else normally required to create a realistic 3D image. As you can’t just do it by fitting together existing loosely suitable photographs (from its training data) and tweaking the results.
I don’t deny that you can catch it out by asking for weird things very different from its training data (though it often makes a good attempt). However that doesn’t explain how it does so well at creating images which are broadly within the range of its training data, but different enough from the photographs it’s seen that a skilled human with Photoshop couldn’t do what it does.