Oh no! You missed out on stating this as 21.5 PETabytes!
I arrived at a similar order of magnitude via a different path. I suspect that almost all cute kitten pictures are just stored in a few (1-5?) locations, and assuming a power law such that maybe 1% of the world population kept 100-ish kitten pictures (almost all of which are by default cute), and maybe 5% more keep an average of 10-ish pictures. I’d estimate each picture to be on the order of a megabyte.
ImageNet was constructed to match the WordNet hierarchy, and is not representative of the distribution of images stored online. I’d guess that cat pics are10x--10Kx overrepresented.
I’d also be shocked if consumer images are even 0.1% of all data stored; there’s a huge volume of other heavier datasets out there.
Look at ImageNet (https://image-net.org/index.php) tags and find the percent of them that are kitten pictures. The International Data Corperation estimates there are around 6.8 zettabytes of storage globally (https://www.idc.com/getdoc.jsp?containerId=prUS46303920). Now we just need the fraction of total storage dedicated to consumer images. Maybe 2%?
I’d guess something like (0.1% kitten pictures) x (2% consumer images) x (6.8 zettabytes) = 21,500 terabytes of kitten images.
Oh no! You missed out on stating this as 21.5 PETabytes!
I arrived at a similar order of magnitude via a different path. I suspect that almost all cute kitten pictures are just stored in a few (1-5?) locations, and assuming a power law such that maybe 1% of the world population kept 100-ish kitten pictures (almost all of which are by default cute), and maybe 5% more keep an average of 10-ish pictures. I’d estimate each picture to be on the order of a megabyte.
That yields around 36 Petabytes.
ImageNet was constructed to match the WordNet hierarchy, and is not representative of the distribution of images stored online. I’d guess that cat pics are10x--10Kx overrepresented.
I’d also be shocked if consumer images are even 0.1% of all data stored; there’s a huge volume of other heavier datasets out there.