gwern comments on Deriving Our World From Small Datasets

gwern 19 Mar 2022 18:25 UTC
3 points

rather than, say, optimizing for CPU cycles or memory consumption

As I already pointed out, we already do. And turns out that you need to optimize more for CPU/memory, past the kilobytes of samples which are already flabby and unnecessary from the point of view of KC. And more. And more. Go right past ‘megabyte’ without even stopping. Still way too small, way too compute/memory-hungry. And a whole bunch more beyond that. And then you hit the Hutter Prize size, and that’s still too optimized for sample-efficiency, and we need to keep going. Yes, blow through ‘gigabyte’, and then more, more, and some more—and eventually, a few orders of magnitude sample-inefficiency later, you begin to hit projects like GPT-3 which are finally getting somewhere, having traded off enough sample-inefficiency (hundreds of gigabytes) to bring the compute requirements down into the merely mortal realm.

A dataset like that gives us the entire Universe, ie. Earth and a vast amount of stuff we probably don’t care about.

You can locate the Earth in relatively few bits of information. Off the top of my head: the observable universe is only 45 billion lightyears radius; how many bits could an index into that possibly take? 24 bits to encode distance from origin in lightyears out of 45b, maybe another 24 bits to encode angle? <50 bits for such a crude encoding, giving an upper bound. You need to locate the Earth in time as well? Another <20 bits or so to pin down which year out of ~4.5b years. If you can do KC at all, another <60 bits or so shouldn’t be a big deal...