otto.barten comments on What Failure Looks Like is not an existential risk (and alignment is not the solution)

otto.barten 3 Feb 2024 14:07 UTC
3 points
0
Great to read you agree that threat models should be discussed more, that’s in fact also the biggest point of this post. I hope this strangely neglected area can be prioritized by researchers and funders.

First, I would say both deliberate hunting down and extinction as a side effect have happened. The smallpox virus is one life form that we actively didn’t like and decided to eradicate, and then hunted down successfully. I would argue that human genocides are also examples of this. I agree though that extinction as a side effect has been even more common, especially for animal species. If we would have a resource conflict with an animal species and it would be powerful enough to actually resist a bit, we would probably start to purposefully hunt it down (for example, orangutans attacking a logger base camp—the human response would be to shoot them). So I’d argue that the closer AI (or an AI-led team) is to our capability to resist, the more likely a deliberate conflict. If ASI blows us out of the water directly, I agree that extinction as a side effect is more likely. But currently, I think AI capabilities that increase more gradually, and therefore a deliberate conflict, is more likely.

I agree that us not realizing that an AI-led team almost has takeover capability would be a scenario that could lead to an existential event. If we realize soon that this could happen, we can simply ban the use case. If we realize it just in time, there’s maximum conflict, and we win (could be a traditional conflict, could also just be a giant hacking fight, or (social) media fight, or something else). If we realize it just too late, it’s still maximum conflict, but we lose. If we realize it much too late, perhaps there’s not even a conflict anymore (or there are isolated, hopelessly doomed human pockets of resistance that can be quicky defeated). Perhaps the last case corresponds to the WFLL scenarios?

Since there’s already, according to a preliminary analysis of a recent Existential Risk Observatory survey, ~20% public awareness of AI xrisk, and I think we’re still relatively far from AGI, let alone from applying AGI in powerful positions, I’m pretty positive that we will realize we’re doing something stupid and ban the dangerous use case well before it happens. A hopeful example are the talks between the US and China about not letting AI control nuclear weapons. This is exactly the reason though why I think threat model consensus and raising awareness are crucial.

I still don’t see WFLL as likely. But a great example could change my mind. I’d be grateful if someone could provide that.
- Lichdar 3 Feb 2024 18:04 UTC
  1 point
  0
  Parent
  I think you are incorrect on dangerous use case, though I am open to your thoughts. The most obvious dangerous case right now, for example, is AI algorithmic polarization via social media. As a society we are reacting, but it doesn’t seem like it is in an particularly effectual way.
  
  Another way to see this current destruction of the commons is via automated spam and search engine quality decline which is already happening, and this reduces utility to humans. This is only in the “bit” universe but it certainly affects us in the atoms universe and as AI has “atom” universe effects, I can see similar pollution being very negative for us.
  
  Banning seems hard, even for obviously bad use cases like deepfakes, though reality might prove me wrong(happily!) there.
  - otto.barten 6 Feb 2024 1:27 UTC
    2 points
    1
    Parent
    Thanks for engaging kindly. I’m more positive than you are about us being able to ban use cases, especially if existential risk awareness (and awareness of this particular threat model) is high. Currently, we don’t ban many AI use cases (such as social algo’s), since they don’t threaten our existence as a species. A lot of people are of course criticizing what social media does to our society, but since we decide not to ban it, I conclude that in the end, we think its existence is net positive. But there are pocket exceptions: smartphones have recently been banned in Dutch secondary education during lecture hours, for example. To me, this is an example showing that we can ban use cases if we want to. Since human extinction is way more serious than e.g. less focus for school children, and we can ban for the latter reason, I conclude that we should be able to ban for the former reason, too. But, threat model awareness is needed first (but we’ll get there).