johnswentworth comments on What Failure Looks Like is not an existential risk (and alignment is not the solution)

johnswentworth 2 Feb 2024 21:13 UTC
23 points
6
This is a topic I’d like to see discussed more on current margins, especially for the “Get What We Measure” scenario. But I think this particular post didn’t really engage with what I’d consider the central parts of a model under which Getting What We Measure leaves humanity extinct. Even if I condition on each AI being far from takeover by itself, each AI running only a small part of the world, AIs having competing goals and not coordinating well, and AIs going into production at different times… that all seems to have near-zero relevance to what I’d consider the central Getting What We Measure extinction story.
The story I’d consider central goes roughly like:
- Species don’t usually go extinct because they’re intentionally hunted to extinction; they go extinct because some much more powerful species (i.e. humans) is doing big things near by which shifts the environment enough that the weaker species dies. That extends to human extinction from AI. (This post opens with a similar idea.)
- In the Getting What We Measure scenario, humans gradually lose de-facto control/power (but retain the surface-level appearance of some control), until they have zero de-facto ability to prevent their own extinction as AI goes about doing other things.
Picking on one particular line from the post (just as one example where the post diverges from the “central story” above without justification):
If a company run by an AI will use too much oxygen (such as occurs in WFLL), there will be a public outcry, lawsuits, political pressure, and plenty of time to regulate.
This only applies very early on in a Getting What We Measure scenario, before humanity loses de-facto control over media and governance. Later on, matters of oxygen consumption are basically-entirely decided by negotiation between AIs with their own disparate goals, and what humans have to say on the matter is only relevant insofar as any of the AIs care at all what the humans have to say.
- otto.barten 3 Feb 2024 14:07 UTC
  3 points
  0
  Parent
  Great to read you agree that threat models should be discussed more, that’s in fact also the biggest point of this post. I hope this strangely neglected area can be prioritized by researchers and funders.
  
  First, I would say both deliberate hunting down and extinction as a side effect have happened. The smallpox virus is one life form that we actively didn’t like and decided to eradicate, and then hunted down successfully. I would argue that human genocides are also examples of this. I agree though that extinction as a side effect has been even more common, especially for animal species. If we would have a resource conflict with an animal species and it would be powerful enough to actually resist a bit, we would probably start to purposefully hunt it down (for example, orangutans attacking a logger base camp—the human response would be to shoot them). So I’d argue that the closer AI (or an AI-led team) is to our capability to resist, the more likely a deliberate conflict. If ASI blows us out of the water directly, I agree that extinction as a side effect is more likely. But currently, I think AI capabilities that increase more gradually, and therefore a deliberate conflict, is more likely.
  
  I agree that us not realizing that an AI-led team almost has takeover capability would be a scenario that could lead to an existential event. If we realize soon that this could happen, we can simply ban the use case. If we realize it just in time, there’s maximum conflict, and we win (could be a traditional conflict, could also just be a giant hacking fight, or (social) media fight, or something else). If we realize it just too late, it’s still maximum conflict, but we lose. If we realize it much too late, perhaps there’s not even a conflict anymore (or there are isolated, hopelessly doomed human pockets of resistance that can be quicky defeated). Perhaps the last case corresponds to the WFLL scenarios?
  
  Since there’s already, according to a preliminary analysis of a recent Existential Risk Observatory survey, ~20% public awareness of AI xrisk, and I think we’re still relatively far from AGI, let alone from applying AGI in powerful positions, I’m pretty positive that we will realize we’re doing something stupid and ban the dangerous use case well before it happens. A hopeful example are the talks between the US and China about not letting AI control nuclear weapons. This is exactly the reason though why I think threat model consensus and raising awareness are crucial.
  
  I still don’t see WFLL as likely. But a great example could change my mind. I’d be grateful if someone could provide that.
  - Lichdar 3 Feb 2024 18:04 UTC
    1 point
    0
    Parent
    I think you are incorrect on dangerous use case, though I am open to your thoughts. The most obvious dangerous case right now, for example, is AI algorithmic polarization via social media. As a society we are reacting, but it doesn’t seem like it is in an particularly effectual way.
    
    Another way to see this current destruction of the commons is via automated spam and search engine quality decline which is already happening, and this reduces utility to humans. This is only in the “bit” universe but it certainly affects us in the atoms universe and as AI has “atom” universe effects, I can see similar pollution being very negative for us.
    
    Banning seems hard, even for obviously bad use cases like deepfakes, though reality might prove me wrong(happily!) there.
    - otto.barten 6 Feb 2024 1:27 UTC
      2 points
      1
      Parent
      Thanks for engaging kindly. I’m more positive than you are about us being able to ban use cases, especially if existential risk awareness (and awareness of this particular threat model) is high. Currently, we don’t ban many AI use cases (such as social algo’s), since they don’t threaten our existence as a species. A lot of people are of course criticizing what social media does to our society, but since we decide not to ban it, I conclude that in the end, we think its existence is net positive. But there are pocket exceptions: smartphones have recently been banned in Dutch secondary education during lecture hours, for example. To me, this is an example showing that we can ban use cases if we want to. Since human extinction is way more serious than e.g. less focus for school children, and we can ban for the latter reason, I conclude that we should be able to ban for the former reason, too. But, threat model awareness is needed first (but we’ll get there).