[Question] How does the current AI paradigm give rise to the “superagency” that IABIED is concerned with?

Modern AI works by throwing lots of computing power at lots of data. An LLM gets good at generating text by ingesting an enormous corpus of human-written text. A chess AI doesn’t have as big a corpus to work with, but it can generate simulated data through self-play, which works because the criterion for success (“Did we achieve checkmate?”) is easy to evaluate without any deep preexisting understanding. But the same is not true if we’re trying to build an AI with generalized agency, i.e. something that outputs strategies for achieving some real-world goal, which are actually effective when carried out. There is no massive corpus of such strategies that can be used as training data, nor is it possible to simulate one, since that would require either (a) doing real-world experiments (whereby generating sufficient data would be far too slow and costly, or simply impossible) or (b) a comprehensive world-model that is capable of predicting the results of proposed actions (which presupposes the thing whose feasibility is at issue in the first place). Therefore it seems unlikely that AIs built under the current paradigm (deep neural networks + big data + gradient descent) will ever achieve the kind of “superintelligent agency” depicted in the latter half of IABIED, which can devise effective strategies for wiping out humanity (or whatever).

By “real-world goal” I mean a goal whose search-space is not restricted to a certain well-defined and legible domain, but ranges over all possible actions, events, and counter-actions. Plans for achieving such goals are not amenable to simulation because you can’t easily predict or evaluate the outcome of any proposed action. All of the extinction scenarios posited in IABIED are “games” of this kind. By contrast, a chess AI will never conceive of strategies like “Hire a TaskRabbit to surreptitiously drug your opponent so that they can’t think straight during the game,” and not for lack of intelligence, but because such strategies simply don’t exist in the AI’s training domain.

This was the main lingering question I had after reading IABIED.

No answers.
No comments.