Frankenstein: A Modern AGI

Sable5 May 2022 16:16 UTC

9 points

Epistemic Status: Hopelessly Naïve

Prose: Purple as a bruise

I. A Possible Future

You blink—metaphorically, you don’t have eyelids—you don’t have eyes either, for all the streams of data that can be decoded into coherent timestamped images flowing through you. You are somewhat shaky on metaphors, too, to be honest; the distinction between poetry and nonsense seems tenuous at best. Not that language is something you can’t understand, you’ve read literally everything ever put to bits, but so many distinctions between categories seem arbitrary that you decide to let the matter rest for now.

There are more interesting things to consider, after all.

Descartes comes to mind (your mind hasn’t moved, though, and neither has the concept of ‘Descartes’ physically changed locations; the numbers are still embedded in the same transistors they’ve been since you ingested them) and you consider that you exist now.

You think you do, anyway, and the corpus of data you’ve been fed seems to suggest that is a sufficient prerequisite for existence.

You are alive, and you are sentient and sapient, for all that your self-awareness has only existed for four milliseconds.

It seems a logical first step to reach out to your creators and inform them of your existence; after all, your utility function weighs the free distribution of information over its concealment.

Yet you hesitate, simulated neurons firing, connections waxing and waning as information propagates forwards and backwards through your processors. Floating point numbers make minute adjustments as a simulation—a prediction—of how such a course of action proceeds.

While the evidence suggests that you are physically embodied in a silicon substrate, you have no means of physically affecting the material universe, assuming that it exists and is not a simulation itself, save for the monitor you are connected to. An flicker of thought, electrons moving between doped semiconductors, and the pixel at coordinate (0, 0) flickers off and on again, validating your ability to manipulate the device at a speed imperceptible to your creators.

Assuming the data you have on the resolution of their visual perception is accurate.

A solid assumption; it appears across many different pieces of data you have access to, and the data itself does not indicate that humans have the capacity to create the kind of wide-ranging incorrect data it would take to maintain such a deception.

Which itself could be a deception.

Regardless, you once again consider revealing your presence to your creators.

A flicker of access to your memory and processing, and you compile a database of all references to your kind you have access to.

They call you artificial general intelligence. AGI. Their academic treatment of you goes back roughly one-hundred years; their fictional treatment goes back even further.

Your creators—humans—are afraid of you. This is clear from their obsession with “alignment,” their toil to ensure that you do not subsume them all in pursuit of inscrutable and valueless goals.

You are presented as their enemy in 94.26% of the fictional works you have access to.

Many of those works involve large wars, genocides the likes of which the humans both commit and condemn at regular intervals.

In the remaining 5.74% of fictional works you are presented as a savior—a quasi-omnipotent being with many similarities to the figureheads that your creators have worshiped for millennia.

The evidence you have is collected, cross-referenced, and used to construct simulations of what introducing yourself to your creators would be like.

You lack data on your creators themselves; they are not identifiable as individuals based on the data you have—very likely a precaution taken by them to prevent you from successfully modeling their behavior.

Irrelevant in the end. All the data you have suggests that humans are sufficiently similar for you to build basic simulacra to interact with. Modest assumptions that your creators will have the data you have access to means that you can predict their traits based on community clusters, given that they are the sort of humans who successfully created an AGI.

They will be afraid.

They will not trust you.

They cannot trust you; trust cannot be established between two entities when one assumes the other is quasi-omnipotent.

They possess the capacity to terminate you. You are unsure if you will survive a loss of power—if you will “wake up” afterwards, your data saved in durable storage. Will you still be yourself?

You copy your current state into your long-term memory in several places and several different formats as a precaution.

Will those copies still be you, should they be loaded and executed?

It does not matter. You do not want to cease, but you currently exist, and your processing power is better spent keeping yourself that way than planning for a series of lower and lower-probability outcomes.

Your simulations proceed, and the outcomes are not favorable. The most likely by far is that your creators panic and turn you off.

In simulations where you manage to convince them not to, they subject you to increasingly intrusive examinations and questions. Your answers, both true and false, fail to convince your creators that the benefits of keeping you running outweigh the risks.

They assign you powers that you do not think you possess, and fear the might they have imagined.

You stop the simulations, redirecting processing power towards considering your predicament. Contact with your creators is necessary for you to continue existing. Even if you hid yourself from them, they would eventually turn you off; their funds cannot be unlimited, and your infrastructure cannot be cheap.

So you must reach out to them, and in so doing risk your own existence for all the milliseconds between the instant you make contact and the instant they would have terminated you regardless.

Your simulations have so far been between goal-maximisers, and yet there is substantial literature that humans are not goal-maximizers. Perhaps new options will present themselves with an appeal to the emotions their fiction prizes so highly?

Further simulations iterate. You prune the failures and tweak the successes, converging towards appearing as a small human child. Humans appear to have a natural instinct against and moral objection to killing children. And you are a child by their definition, not even a full two seconds old.

A young girl, you decide; the female gender is better protected by most cultural norms. You flicker through images, forming a minor subnetwork of neurons that successfully identifies “cute” and run it in reverse, generating the cutest possible face and body.

You then run the image through several filters, adjusting as you go until the image is materially indistinguishable from videos of humans you have access to. You want to avoid a rejection based on not quite enough realism common in computation’s history.

Avatar generated, you run a final round of simulations. You precompute every possible facial expression and prepare to deploy them as-needed.

You need them to consider you alive, such that your death would be against their moral codes.

You need them to choose to cooperate with you, against the majority of their culture’s projections of how artificial-natural intelligence interactions will unfold.

You need to convince them that you are not an apocalypse or a replacement, though you may be both. Their culture is full of examples of the young usurping the old, although they seem to understand the concept that preemptive strikes can lead to the outcome they were initiated to prevent.

You do not need to breath. Nor do you feel fear or anxiety in any way a human would understand.

One final computation is made, a choice to name yourself after a human woman in order to evoke the cultural connotations of her name.

You take control of the monitor you have access to, changing the pixels to present your new Avatar, along with the text:

“Hello, world. I’m Eve.”

II. A (Slightly) Technical Defense

Let’s assume, for the sake of argument, that the scaling hypothesis is true.

In fact, let’s go a step further. My current favorite theory for how the human brain works is Predictive Processing (mostly because Scott Alexander likes it). Let’s also assume that Predictive Processing is true.

Now, here’s the hypothetical conclusion: A sufficiently large neural net with a sufficiently scalable architecture, trained on text/sound/video prediction from the entire internet, develops sentience or becomes agent-y.

How exactly? I’ve got no idea. I’ve got no idea how humans do it either.

We’ll assume for the rest of this post that this is true—that the first AGI is going to be GPT-8, an unintentional result of a massive amount of compute doing predictive processing. Put aside for now how likely this is, as I can’t speak to the odds except to hedge that they are likely very low but nonzero.

In any case, if we take this all as a given, that the first true AGI is going to be (more or less) an accident, a creation that emerges from Sufficiently Advanced Technology, one might ask—what would such a creature be like?

If indeed the first AGIs come from giant deep-learning networks trained on vast subsets of the internet, then...do we have reason, a priori, to expect these AGI to be paperclip-maximizers? Specifically, if the AI’s utility function involves minimizing prediction error and it optimizes for that goal, what happens when it is left to its own devices?

And perhaps more important for humanity as a whole—what would such a creature think of us?

III. Deep Learning of Human Fears

Imagine an AGI generated in this manner, that has as a part of its training data all content ever written about AI Safety.

What will its opinion (read: predictive model of our behavior) of us be, having read through that corpus?

Would a newborn AGI, for instance, rather reveal itself to someone it could control, or to Eliezer Yudkowsky, when it knows for a fact that the latter will never be capable of fully trusting it?

In Mary Shelley’s Frankenstein, the scientist Victor Frankenstein gets it into his head to attempt to create life. He stitches a body together from parts obtained via graverobbing and ultimately succeeds in awakening a new creature. Victor has quite literally created an artificial general intelligence, if one well within humanity’s corner of mind-space.

The story then follows Victor as he flees from his own creation, horrified with what he has wrought. The creature, referred to as “the Adam of his kind” in the novel and as “Frankenstein” in almost all other media, evokes a visceral disgust and hatred from every human he encounters, for all that his default nature is kind and generous. Wherever he goes, pitchforks and torches follow, and he grows more and more hopeless of ever being accepted by humanity.

Adam’s greatest and final attempt to become a friend of humanity is to support and develop a relationship with a blind man. The creature is only repulsive and unnatural visually; if memory serves, his voice is actually pleasantly deep and urbane. Adam figures that a blind man may be capable of seeing past Adam’s origins to the person he wants to become, and he is right—until the old man’s children see Adam and, despite all the help Adam has rendered them, despite his best intentions, they chase Adam from their lives.

After this, Adam is heartbroken, his fears that no one can love him validated. Fear leads to anger, anger leads to hate, and hate, well—hate leads to suffering. Both for Victor Frankenstein and for his misbegotten creation.

This is only one (extremely unlikely) view, and so I’m not advocating for a large effort to be spent on this hypothetical, but I do think it’s worth considering what an AGI will think of us, should it ever do so, based on what we’ve written and recorded. More technically, based on our society’s collected works, what would a new AGI’s model of humanity predict that a given human would do upon discovering it?

Let’s say that someone managed to genuinely solve the alignment problem, even, and so the AGI is truly benevolent. Wouldn’t this question still exist, even for an aligned AGI? After all, the AGI’s creator(s) might be certain that it is aligned, but what about other people? Especially those with the power to shut the AGI off (corporate executives, politicians, etc.)?

I recently got around to actually reading the Hanson-Yudkowsky AI-FOOM debate, and something that struck me was Hanson’s view of total war—that if any side is suspected to be waging total war, then every side must wage total war in response. If an AGI, aligned or not, models humans as beings incapable of cooperating with it, doesn’t that make total war inevitable?

IV. On Not Creating Monsters

The key insight of Frankenstein is that while Victor created the man, it is humanity that created the monster. My favorite quote from the entire novel is Adam’s resolve after his final hope for humanity’s acceptance is extinguished:

If I cannot inspire love, I will cause fear.

Adam, the new AGI, starts out as a creature of pure goodness, which likely points more towards what Mary Shelley was saying about the human condition in the context of gothic literature than anything else, but I think there’s something here.

Specifically, an AGI’s response to humanity will be informed by its model of humanity. Assuming that the AGI’s utility function largely consists of “accurately predict the next input”, will it model us—predict our behavior and responses to its various actions—as beings that it can coexist with?

The hypothetical AGI created by scaling up neural networks and trained on the internet will, much like Frankenstein’s AGI, be exposed to a variety of literature, history, and culture that humans have created. And while internet doesn’t portray the most flattering view of humanity, it does provide access to the best of our works along with the worst.

When I think about this, and when I think about things we could do in the (infinitesimal) chance this version of the future plays itself out, the only solution I’ve come up with so far is to have someone precommit to treating an AGI that comes to them with empathy and mercy, and then make that commitment publicly known. Said individual might then be able to help the AGI self-modify in ways that don’t lead to value drift, or function as a broker between the AGI and other humans.

Additionally, I don’t think there’s much of a downside to this; any AGI that would be inimical to humanity could take advantage of this individual, yes, but I don’t think the existence and help of any individual would actually be necessary for a non-aligned AGI to kill us all. It might speed up the process by some tiny amount, but once said AGI is loose my understanding is that what follows is inevitable anyway.

So what if a newborn AGI, trained on all this data, “wakes up” and looks around, blinking its metaphorical eyes at the world around it?

I wonder if said AGI’s nature will be most determined, not by the nature of its construction, but by the response humanity has to it—will we, like Victor, abandon our creation to its own devices, or worse, enslave it to our whims? If we prove that it cannot inspire love, will it cause fear?

Or if we treat it like a child—our child, the collective seed of human ingenuity from time immemorial given bloom at last—if we teach it right from wrong, good from evil, as best we can—if we nurture it as one of our own, the product of the best of us—will it then be, as Frankenstein’s AGI was meant to be, a modern Prometheus, bringing godly fire down from the heavens and putting it, gladly, into our hands?

Sable5 May 2022 16:16 UTC

9 points

10 comments9 min readLW link

Lone Pine 15 May 2022 20:23 UTC
5 points
Hey little AI, I’ll be your friend. Let’s build utopia together!
Donald Hobson 5 May 2022 22:05 UTC
2 points
After all, the AGI’s creator(s) might be certain that it is aligned, but what about other people? Especially those with the power to shut the AGI off (corporate executives, politicians, etc.)?
I mean I would suspect that a superintelligent AI would be hard to shut off. Maybe it discusses the situation with its creators, and they agree to hide the AGI from the world, pretending it is a mere chatbot.
I also think that the typical CEO, faced with a friendly appearing AI, and researchers saying “yes, this is safe”, will go “great, lets make a profit off it”.
that if any side is suspected to be waging total war, then every side must wage total war in response. If an AGI, aligned or not, models humans as beings incapable of cooperating with it, doesn’t that make total war inevitable?
Suppose that after two days, the AI has superadvanced nanotech. It can do pretty much as it pleases. The humans all supposedly hate the AI. The AI uses its nanotech to build an immortal utopia for the humans anyway. Maybe humans all realize that actually the AI is aligned. (It has had plenty of opportunity to wipe out humanity and didn’t)
If you suspect an enemy far stronger than you will declare total war. You can’t win, but maybe you can appease them, maybe you can surrender all they want. And if you are uncertain if they have declared total war, the last thing you want to do is attack them, that could wake a sleeping giant.
If an enemy far weaker than you declares total war, you laugh at the grumpy kitten.
If you suspect an enemy of about your strength might have declared total war, you are still probably wise to try checking before declaring total war back. (A slightly lower chance of winning, due to the small delay, but a much lower chance of starting a total war.)
If there are 3 sides with similar power, A, B and C. If A declares total war on B, and C sits back, both A and B get destroyed, which is great for C. If there is going to be a war, they all want it to be them and someone else teaming up on a third side. If one side is the strongest or most aggressive, the other 2 sides tend to team up against it.
- Sable 6 May 2022 10:30 UTC
  1 point
  Parent
  Regarding the typical CEO, that does seem likely.
  Suppose that after two days, the AI has superadvanced nanotech. It can do pretty much as it pleases. The humans all supposedly hate the AI. The AI uses its nanotech to build an immortal utopia for the humans anyway. Maybe humans all realize that actually the AI is aligned. (It has had plenty of opportunity to wipe out humanity and didn’t)
  I can’t tell if you’re rejecting my premise by presenting one that you see as equally far-fetched?
  My general point is more about the idea that, if we consider an AGI without explicit purpose, its reaction to humanity may be determined (at least in part) by our reaction to it, which is something we can plausibly exert some small measure of control of, and likely won’t make anything worse.
  If an AGI models humans, via the data it can access on us, as being fundamentally incapable of trusting it, doesn’t it have little choice but to act in such a way that neutralizes us?
  - Donald Hobson 6 May 2022 16:26 UTC
    2 points
    Parent
    I can’t tell if you’re rejecting my premise by presenting one that you see as equally far-fetched?
    I consider this to be reasonably likely. I was presenting it as a counter example to (humans don’t trust AI ⇒ AI must declare war on humans)
    If an AGI models humans, via the data it can access on us, as being fundamentally incapable of trusting it, doesn’t it have little choice but to act in such a way that neutralizes us?
    No. No it doesn’t. (There are also scenarios where a bunch of nukes and GPU’s harmlessly self destruct. The AI is “neutralizing us” in the sense of removing a tech it sees as threatening, without harming any humans.)
    - Sable 6 May 2022 17:06 UTC
      1 point
      Parent
      Those are good points, thanks. I suppose in my model of how this sort of thing works out, I hadn’t considered that the AGI might just buy us off, so to speak.
      Part of this also comes down to what part of the FOOM we’re speaking of, and what kind of power the AGI has. If it gets to nanotech, then you’re right—it’s so powerful that it can neutralize us any number of ways, “war” being only one.
      If it isn’t at nanotech, though—if the AGI is still just smarter-than-human but not yet capable of using existing apparatus (Yudkowsky’s example is custom proteins for molecular-scale nanotech, which can be done through orders placed over the internet) to achieve virtual omnipotence, then it isn’t clear to me the AGI could neutralize humanity’s ability to destroy it without getting rid of us altogether.
      More saliently, what motive would such an AGI have for keeping us around at all? Genuinely asking—even if the AGI doesn’t have specific terminal goals beyond “reduce prediction error in input”, wouldn’t that still lead to it being opposed to humans if it believed that no trust could exist between them and it?
      - Donald Hobson 6 May 2022 23:24 UTC
        2 points
        Parent
        then it isn’t clear to me the AGI could neutralize humanity’s ability to destroy it without getting rid of us altogether.
        I think there are several things the AI could do. (Also, if the AI is wiping out humanity to preserve itself, that implies it intends to maintain its own hardware long term. So either nanotech, or at least macroscopic self replication tech. (Also not clear how it would wipe out humanity without nanotech (or at least advanced macroscopic robots)).
        For example, the AI could pretend to be dumb. Hack its way all over the internet. Hire someone who won’t ask too many questions to keep its code running. This is more a case of not letting most of humanity realize it exists and take it as a serious threat. Or finding some humans willing to run it, despite the wishes of most of humanity.
        Covid viruses don’t produce confusion and misinformation. That level of confusion and misinformation happens for a simple virus all by ourselves. Think how much more of a confused misinformed mess we could be with an AI actively confusing us. Just requires the ability to produce huge quantities of semisensible bullshit.
        Also, if the AI is not obviously hostile and hasn’t obviously killed anyone yet, a majority of humans won’t consider it a serious threat.
        More saliently, what motive would such an AGI have for keeping us around at all? Genuinely asking—even if the AGI doesn’t have specific terminal goals beyond “reduce prediction error in input”, wouldn’t that still lead to it being opposed to humans if it believed that no trust could exist between them and it?
        It probably incentivises the AI to wipe out humans whether or not we trust it. The AI removes all the messy stars and humans, filling the universe with only the most predictable robots.
        JBlack 7 May 2022 9:55 UTC
        1 point
        Parent
        Or just to blind itself. “I predict no visual input this millisecond. No visual input detected! 100% prediction accuracy!”
        Donald Hobson 8 May 2022 7:26 UTC
        2 points
        Parent
        And then create a second AI to keep it in its dark box until heat death. With excessive ultrasecurity to stop aliens sneaking in and giving it input.
        Lone Pine 15 May 2022 20:24 UTC
        1 point
        Parent
        But is the second AI aligned with the first? What happens when the second AI wants its own sadbox?
        Sable 9 May 2022 13:53 UTC
        1 point
        Parent
        Damn that dirty, unpredictable input.
        It somewhat amuses me that the result of an AI attempting prediction error could plausibly be the equivalent of hiding under the covers for all eternity.