jacob_cannell comments on LOVE in a simbox is all you need

jacob_cannell 30 Sep 2022 19:44 UTC
2 points
0

To me the time/cost issue with the simboxes you proposed is in the data you need to train the AGIs from within the sim to prevent information leakage. Unlike with current training, we can’t just give it the whole internet, as that will contain loads of information about humans, how ML works, that it is in a sim etc which would be very dangerous. Instead, we would need to recapitulate the entire data generating process within the sim, which is what would be expensive.

I’m not quite sure what you mean by data generating process, but the training cost is no different for a tightly constrained run vs an unconstrained run. An unconstrained run would involve something like a current human development process, where after say 5 years or whatever of basic sensory/motor grounding experience they are learning language then on the internet. A constrained run is exactly the same but for a much earlier historical time, long before the internet. The construction of the sim world to recreate the historical era is low cost in comparison to the AGI costs.

Naively, the only way to do this would be to actually simulate a bunch of agents interacting with the sim world for a long time, which would be at minimum simulated-years for human-level data efficiency and much much longer for current DL.

I’m expecting AGI will require the equivalence of say 20 years of experience, which we can compress about 100x through parallelization rather than serial speedup, basically just like in current DL systems. Consider VPT for example, which reaches expert human level in minecraft after training on the equivalent of 10 years of human minecraft experience.

It is possible, I guess, to amortise this work and create one ‘master-sim’ which so that we can try various AGI designs which all share the same dataset, and this would be good experimentally to isolate the impact of architecture/objective vs dataset, but under the reward-proxy learning approach, a large factor in the success in alignment depends on the dataset, which would be very expensive to recreate in sim without information transfer from our reality (well constructed open world RGPs already largely do this module obvious easter eggs, and they aren’t even trying very hard).

I’m not really sure what you mean by ‘dataset’ here, as there isn’t really a dataset other than the agents lifetime experiences in the world, procedurally generated by the sim. Like I said in the article, the simplest early simboxes don’t need to be much more complex than minecraft, but obviously it gets more interesting when you have a more rich, detailed fantasy world with it’s own history and books, magic system, etc. None of this is difficult to create now, and is only getting easier and cheaper. The safety constraint is not zero information transfer at all , as that wouldn’t even permit a sim, the constraint is to filter out new modern knowledge or anything that is out of character for the sim world coherency .

We want to use multiple worlds and scenarios to gain diversity and robustness, but again that isn’t so difficult or costly.

The ideal baseline cost of simboxing is only O(N+1) vs O(N) without

I don’t understand this. Presumably we will want to run a lot of training runs in the sim since we will probably need to iterate a considerable number of times to actually succeed in training a safe AGI.

Completely forget LLMs, just temporarily erase them from your mind for a moment. There is an obvious path to AGI—deepmind’s path—which consists of reverse engineering the brain, testing new architectures in ever more complex sim environments. Starting first with Atari, now moving on to minecraft, recapitulating video game’s march of moore’s law progress. This path is already naturally using simboxes and thus safe. So in this framework, let’s say it requires N training experiments to nail AGI (where each experiment trains a single shared model for around human-level age but parallelizing over a hundred to a thousand agents, as is done today). Then using simboxes is just a matter of never training the AGI in an unsafe world until the last final training run, once the design is perfected. The cost is then ideally is just one additional training run.

So the only way that the additional cost of safe simboxing could be worse/larger than one additional training run is if there is some significant disadvantage to training in purely historical/fantasy sim worlds vs sci-fi/modern sim worlds.

But we have good reasons to believe there shouldn’t be any such disadvantage: the architecture of the human brain certainly hasn’t changed much in the last few thousand years, intelligence is very general, etc.

Having AGI agents doing ML research in a reality which is close enough to our own that its insights transfer to our reality defeats the whole point of having a sim, which is preventing information leakage about our reality!

No agents are doing ML research in the simboxes, I said agents (or architectures rather) that are determined to be reasonably safe/altruistic can ‘graduate’ to reality and help iterate.

There is some evidence that ML is in the same power-law scaling regime as biological brains as well as a lot of algorithms from statistics, but I don’t think the evidence is conclusively against the possibility of a radically better paradigm which perhaps both us and evolution haven’t found

I mostly agree with you about the foom and scaling regimes. However I do believe there is various work on learning theory which suggests some bounds on scaling laws, (just haven’t read that literature recently). For example there are some scenarios (based on the statistical assumptions you place on efficient circuit/data distributions) where standard linear SGD (based on normal assumptions) is asytomptically suboptimal compared to alternates like exponential/multiplicative GD if the normal assumption is wrong and the circuit distribution is actually log-normal. There was also a nice paper recently which categorized the taxonomy and hierarchy of all known learning algorithms that approximate ideal bayesian learning (need to refind).