Aaro Salosensaari comments on Questions about ″formalizing instrumental goals”

Aaro Salosensaari 12 Apr 2022 21:57 UTC
2 points
0
“if I were an AGI, then I’d be able to solve this problem” “I can easily imagine”
Doesn’t this way of analysis come with a ton of other assumptions left unstated?
Suppose “I” am an AGI running on a data center and I can modeled as an agent with some objective function that manifest as desires and I know my instantiation needs electricity and GPUs to continue running. Creating another copy of “I” running in the same data center will use the same resources. Creating another copy in some other data center requires some other data center.
Depending on the objective function and algorithm and hardware architecture bunch of other things, creating copies may result some benefits from distributed computation (actually it is quite unclear to me if “I” happen already to be a distributed computation running on thousands of GPUs—do “I” maintain even a sense of self—but let’s no go into that).
The key here is the word may. Not obviously it necessarily follows that…
For example: Is the objective function specified so that the agent will find creating a copy of itself beneficial for fulfilling the objective function (informally, it has internal experience of desiring to create copies)? As the OP points out, there might be a disagreement: for the distributed copies to be any useful, they will have different inputs and thus they will end in different, unanticipated states. What “I” am to do when “I” disagree another “I”? What if some other “I” changes, modifies its objective function into something unrecognizable to “me”, and when “we” meet, it gives false pretenses of cooperating but in reality only wants hijack “my” resources? Is the “trust” even the correct word here, when “I” could verify instead: maybe “I” prefer to create and run a subroutine of limited capability (not a full copy) that can prove its objective function has remained compatible with “my” objective function and will terminate willingly after it’s done with its task (killswitch OP mentions) ? But doesn’t this sound quite like our (not “our” but us humans) alignment problem? Would you say “I can easily imagine if I were an AGI, I’d be easily able to solve it” to that? Huh? Reading LW I have come to think the problem is difficult to the human-general intelligence.
Secondly: If “I” don’t have any model of data centers existing in the real world, only the experience of uploading myself to other data centers (assuming for the sake of argument all the practical details of that can be handwaved off), i.e. it has a bad model of the self-other boundary described in OPs essay, it could easily end up copying itself to all available data centers and then becoming stuck without any free compute left to “eat” and adversely affecting human ability to produce more. Compatible with model and its results in the original paper (take the non-null actions to consume resource because U doesn’t view the region as otherwise valuable). It is some other assumptions (not the theory) that posit an real-world affecting AGI would have U that doesn’t consider the economy of producing the resources it needs.
So if “I” were to successful in running myself with only “I” and my subroutines, “I” should have a way to affecting the real world and producing computronium for my continued existence. Quite a task to handwaved away as trivial! How much compute an agent running in one data center (/unit of computronium) needs to successfully model all the economic constraints that go into the maintenance of one data center? Then add all the robotics to do anything. If “I” have a model of running everything a chip fab requires more efficiently than the current economy, and act on it, but the model was imperfect and the attempt is unsuccessful but destructive to economy, well, that could be [bs]ad and definitely a problem. But it is a real constraint to the kind of simplifying assumptions the OP critiques (disembodied deployer of resources with total knowledge).
All of this—how would “I” solve a problem and what problems “I” am aware of—is contingent on, I would call them, the implementation details. And I think author is right to point them out. Maybe it does necessary follows, but it needs to be argued so.