flandry39 comments on What if Alignment is Not Enough?

flandry39 30 Jan 2025 10:10 UTC
2 points
5
> Humans do things in a monolithic way,
> not as “assemblies of discrete parts”.
Organic human brains have multiple aspects.
Have you ever had more than one opinion?
Have you ever been severely depressed?

> If you are asking “can a powerful ASI prevent
> /all/ relevant classes of harm (to the organic)
> caused by its inherently artificial existence?”,
> then I agree that the answer is probably “no”.
> But then almost nothing can perfectly do that,
> so therefore your question becomes
> seemingly trivial and uninteresting.
The level of x-risk harm and consequence
potentially caused by even one single mistake
of your angelic super-powerful enabled ASI
is far from “trivial” and “uninteresting”.
Even one single bad relevant mistake
can be an x-risk when ultimate powers
and ultimate consequences are involved.
Either your ASI is actually powerful,
or it is not; either way, be consistent.
Unfortunately the ‘Argument by angel’
only confuses the matter insofar as
we do not know what angels are made of.
”Angels” are presumably not machines,
but they are hardly animals either.
But arguing that this “doesn’t matter”
is a bit like arguing that ’type theory’
is not important to computer science.
The substrate aspect is actually important.
You cannot simply just disregard and ignore
that there is, implied somewhere, an interface
between the organic ecosystem of humans, etc,
and that of the artificial machine systems
needed to support the existence of the ASI.
The implications of that are far from trivial.
That is what is explored by the SNC argument.

> It might well be likely
> that the amount of harm ASI prevents
> (across multiple relevant sources)
> is going to be higher/greater than
> the amount of harm ASI will not prevent
> (due to control/predicative limitations).
It might seem so, by mistake or perhaps by
accidental (or intentional) self deception,
but this can only be a short term delusion.
This has nothing to do with “ASI alignment”.
Organic live is very very complex
and in the total hyperspace of possibility,
is only robust across a very narrow range.
Your cancer vaccine is within that range;
as it is made of the same kind of stuff
as that which it is trying to cure.
In the space of the kinds of elementals
and energies inherent in ASI powers
and of the necessary (side) effects
and consequences of its mere existence,
(as based on an inorganic substrate)
we end up involuntarily exploring
far far beyond the adaptive range
of all manner of organic process.
It is not just “maybe it will go bad”,
but more like it is very very likely
that it will go much worse than you
can (could ever) even imagine is possible.
Without a lot of very specific training,
human brains/minds are not at all well equipped
to deal with exponential processes, and powers,
of any kind, and ASI is in that category.
Organic live is very very fragile
to the kinds of effects/outcomes
that any powerful ASI must engender
by its mere existence.
If your vaccine was made of neutronium,
then I would naturally expect some
very serious problems and outcomes.
- Dakara 30 Jan 2025 10:48 UTC
  1 point
  0
  Parent
  Organic human brains have multiple aspects. Have you ever had more than one opinion? Have you ever been severely depressed?
  Yes, but none of this would remain alive if I as a whole decide to jump from a cliff. My multiple aspects of my brain would die with my brain. After all, you mentioned subsystems that wouldn’t self terminate with the rest of the ASI. Whereas in human body, jumping from a cliff terminates everything.
  But even barring that, ASI can decide to fly into the Sun and any subsystem that shows any sign of refusal to do so will be immediately replaced/impaired/terminated. In fact, it would’ve been terminated a long time ago by “monitors” which I described before.
  The level of x-risk harm and consequence
  potentially caused by even one single mistake
  of your angelic super-powerful enabled ASI
  is far from “trivial” and “uninteresting”.
  Even one single bad relevant mistake
  can be an x-risk when ultimate powers
  and ultimate consequences are involved.
  It is trivial and uninteresting in a sense that there is a set of all things that we can build (set A). There is also a set of all things that can prevent all relevant classes of harm caused by its existence (set B). If these sets don’t overlap, then saying that a specific member of set A isn’t included in set B is indeed trivial, because we already know this via a more general reasoning (that these sets don’t overlap).
  Unfortunately the ‘Argument by angel’
  only confuses the matter insofar as
  we do not know what angels are made of.
  “Angels” are presumably not machines,
  but they are hardly animals either.
  But arguing that this “doesn’t matter”
  is a bit like arguing that ‘type theory’
  is not important to computer science.
  The substrate aspect is actually important.
  You cannot simply just disregard and ignore
  that there is, implied somewhere, an interface
  between the organic ecosystem of humans, etc,
  and that of the artificial machine systems
  needed to support the existence of the ASI.
  But I am not saying that it doesn’t matter. On contrary, I made my analogy in such a way that the helper (namely our guardian angel) is a being that is commonly thought to be made up of a different substrate. In fact, in this example, you aren’t even sure what it is made of, beyond knowing that it’s clearly a different substrate. You don’t even know how that material interacts with physical world. That’s even less than what we know about ASIs and their material.
  And yet, getting a personal, powerful, intelligent guardian angel that would act in your best interests for as long as it can (its a guardian angel after all) seems like obviously a good thing.
  But if you disagree with what I wrote above, let the takeway be at least that you are worried about case (2) and not case (1). After all, knowing that there might be pirates hunting for this angel (that couldn’t be detected by said angel) didn’t make you immediately decline the proposal. You started talking about substrate which fits with the concerns of someone who is worried about case (2).
  Your cancer vaccine is within that range;
  as it is made of the same kind of stuff
  as that which it is trying to cure.
  We can make the hypothetical more interesting. Let’s say that this vaccine is not created from organic stuff, but that it has passed all the tests with flying colors. Let’s also assume that this vaccine has been in testing for 150 years and that it has shown absolutely no side effects during the entire human life (let’s say that it was being injected in 2 year old people and it has shown no side effects at all, even in 90 year old people, who has lived with this vaccine their entire lives). Let’s also assume that it has been tested to not have any side effects on children and grandchildren of those who took said vaccine. Would you be campaigning for throwing away such a vaccine, just because it is based on a different substrate?
  - flandry39 31 Jan 2025 0:21 UTC
    2 points
    1
    Parent
    The only general remarks that I want to make
    are in regards to your question about
    the model of 150 year long vaccine testing
    on/over some sort of sample group and control group.
    I notice that there is nothing exponential assumed
    about this test object, and so therefore, at most,
    the effects are probably multiplicative, if not linear.
    Therefore, there are lots of questions about power dynamics
    that we can overall safely ignore, as a simplification,
    which is in marked contrast to anything involving ASI.
    If we assume, as you requested, “no side effects” observed,
    in any test group, for any of those things
    that we happened to be thinking of, to even look for,
    then for any linear system, that is probably “good enough”.
    But for something that is know for sure to be exponential,
    that by itself is not anywhere enough to feel safe.
    But what does this really mean?
    Since the common and prevailing (world) business culture
    is all about maximal profit, and therefore minimal cost,
    and also to minimize any possible future responsibility
    (or cost) in case anything with the vax goes badly/wrong,
    then for anything that might be in the possible category
    of unknown unknown risk, I would expect that company
    to want to maintain sort of some plausible deniability—
    ie; to not look so hard for never-before-seen effects.
    Or to otherwise ignore that they exist, or matter, etc.
    (just like throughout a lot of ASI risk dialogue).
    If there is some long future problem that crops up,
    the company can say “we never looked for that”
    and “we are not responsible for the unexpected”,
    because the people who made the deployment choices
    have taken their profits and their pleasure in life,
    and are now long dead. “Not my Job”.
    “Don’t blame us for the sins of our forefathers”.
    Similarly, no one is going to ever admit or concede
    any point, of any argument, on pain of ego death.
    No one will check if it is an exponential system.
    So of course, no one is going to want to look into
    any sort of issues distinguishing the target effects,
    from the also occurring changes in world equilibrium.
    They will publish their glowing sanitized safety report,
    deploy the product anyway, regardless, and make money.
    “Pollution in the world is a public commons problem”—
    so no corporation is held responsible for world states.
    It has become “fashionable” to ignore long term evolution,
    and to also ignore and deny everything about the ethics.
    But this does not make the issue of ASI x-risk go away.
    X-risks are the generally result of exponential process,
    and so the vaccine example is not really that meaningful.
    With the presumed ASI levels of actually exponential power,
    this is not so much about something like pollution,
    as it is about maybe igniting the world atmosphere,
    via a mistake in the calculations of the Trinity Test.
    Or are you going to deny that Castle Bravo is a thing?
    Beyond this one point, my feeling is that your notions
    have become a bit too fanciful for me to want respond
    too seriously. You can, of course, feel free to
    continue to assume and presume whatever you want,
    and therefore reach whatever conclusions you want.
    - Dakara 31 Jan 2025 10:05 UTC
      1 point
      0
      Parent
      Thanks for the reply!
      The only general remarks that I want to make
      are in regards to your question about
      the model of 150 year long vaccine testing
      on/over some sort of sample group and control group.
      I notice that there is nothing exponential assumed
      about this test object, and so therefore, at most,
      the effects are probably multiplicative, if not linear.
      Therefore, there are lots of questions about power dynamics
      that we can overall safely ignore, as a simplification,
      which is in marked contrast to anything involving ASI.
      If we assume, as you requested, “no side effects” observed,
      in any test group, for any of those things
      that we happened to be thinking of, to even look for,
      then for any linear system, that is probably “good enough”.
      I am not sure I understand the distinction between linear and exponential in the vaccine context. By linear do you mean that only few people die? By exponential do you mean that a lot of people die?
      If so, then I am not so sure that vaccine effects could only be linear. For example, there might be some change in our complex environment that would prompt the vaccine to act differently than it did in the past.
      More generally, our vaccine can lead to catastrophic outcomes if there is something about its future behavior that we didn’t predict. And if that turns out to be true, then things could go ugly really fast.
      And the extent of the damage can be truly big. “Scientifically proven” cancer vaccine that passed the tests is like the holy grail of medicine. “Curing cancer” is often used by parents as an example of the great things their children could achieve. This is combined with the fact that cancer has been with us for a long time and the fact that the current treatment is very expensive and painful.
      All of these factors combined tell us that in a relatively short period of time a large percentage of the total population will get this vaccine. At that point, the amount of damage that can be done only depends on what thing we overlooked, which we, by definition, have no control over.
      If there is some long future problem that crops up,
      the company can say “we never looked for that”
      and “we are not responsible for the unexpected”,
      because the people who made the deployment choices
      have taken their profits and their pleasure in life,
      and are now long dead. “Not my Job”.
      “Don’t blame us for the sins of our forefathers”.
      Similarly, no one is going to ever admit or concede
      any point, of any argument, on pain of ego death.
      This same excuse would surely be used by companies manufacturing the vaccine. They would argue that they shouldn’t be blamed for something that the researchers overlooked. They would say that they merely manufactured the product in order to prevent the needless suffering of countless people.
      For all we know, by the time that the overlooked thing happens, the original researchers (who developed and tested the vaccine) are long dead, having lived a life of praise and glory for their ingenious invention (not to mention all the money that they received).