I work on the same floor as someone working on Google’s AI features for camera. They said that the raw data they work with every time you press the shutter is a series of absolutely horrible looking photos which nobody would ever want to look at. They then combine that into a single good looking shot using a variety of classical and ML techniques.
My guess is this is true for most cameras and camera apps—they always destroy some information in order to give you a picture you like at first site.
I think the objection is mostly around predictability? Combining several images into one more accurate image isn’t the issue, it’s logic that does different things based on a more detailed model of the world. Things like recognizing common objects (moon etc) and using what you know they should look like as a prior in interpreting fuzzy images, picking which one of several sequential images has the best smile from each person in a group, or the automatic facial unstretching here.
Yeah, that. Simple stacking is one thing; it “concentrates” real information, so that the final image contains more “reality per bit”. It’s not worse than image compression, or choosing a focal point or aperture or exposure. Making things up is different, and making things up in a way that destroys real information is different yet again.
… and frankly picking the best smiles individually from a stack, or substituting your moon for mine, rises to the point of trying to rewrite my memories, and doing it without a specific user request is NOT OK (TM).
It’s also worth mentioning that the reason they do a lot of that stuff is that the camera hardware is fundamentally very bad, with really noisy sensors and distorted optics. I know you can’t really do better in a tiny camera, but there’s an almost fraudulent attempt to hide the hardware limitations from the consumer.
I work on the same floor as someone working on Google’s AI features for camera. They said that the raw data they work with every time you press the shutter is a series of absolutely horrible looking photos which nobody would ever want to look at. They then combine that into a single good looking shot using a variety of classical and ML techniques.
My guess is this is true for most cameras and camera apps—they always destroy some information in order to give you a picture you like at first site.
I think the objection is mostly around predictability? Combining several images into one more accurate image isn’t the issue, it’s logic that does different things based on a more detailed model of the world. Things like recognizing common objects (moon etc) and using what you know they should look like as a prior in interpreting fuzzy images, picking which one of several sequential images has the best smile from each person in a group, or the automatic facial unstretching here.
Yeah, that. Simple stacking is one thing; it “concentrates” real information, so that the final image contains more “reality per bit”. It’s not worse than image compression, or choosing a focal point or aperture or exposure. Making things up is different, and making things up in a way that destroys real information is different yet again.
… and frankly picking the best smiles individually from a stack, or substituting your moon for mine, rises to the point of trying to rewrite my memories, and doing it without a specific user request is NOT OK (TM).
It’s also worth mentioning that the reason they do a lot of that stuff is that the camera hardware is fundamentally very bad, with really noisy sensors and distorted optics. I know you can’t really do better in a tiny camera, but there’s an almost fraudulent attempt to hide the hardware limitations from the consumer.