TAG comments on Covid 6/17: One Last Scare

TAG 18 Jun 2021 11:54 UTC
1 point
0

For what it’s worth, I am in the second camp, and think the probability of doom is currently high, partly for the reason explained in this thread: N

Unstated assumptions: ASI will be achieved by a sudden jump, not incremental improvement. Corrigibility won’t work. ASI will agentive.
- Daniel Kokotajlo 18 Jun 2021 15:27 UTC
  8 points
  0
  Parent
  I think it’s not that extreme. More like “The various non-agenty AIs won’t be enough to make aligning the agenty ones substantially easier” and “Alignment failures won’t become obvious and scary at stages prior to N before they happen at stage N, where N is the first stage that we have to get right or else.” (Analogy: We got humans to the moon safely on the first try, but this was because we had various tests beforehand to iron out the kinks, including ones that in fact blew up catastrophically. The assumption is that there won’t be good opportunities to test things out beforehand. Though I guess you could say that’s not an assumption, it’s the claim itself.) As for corrigibility… I mean it might work, but the claim is that we shouldn’t expect it to work on the first try.
  - TAG 18 Jun 2021 16:20 UTC
    1 point
    0
    Parent
    
    The various non-agenty AIs won’t be enough to make aligning the agenty ones substantially easier”
    
    Fortunately, it doesn’ have to, so long as the agenty ones aren’t the most powe rful.
    
    corrigibility… I mean it might work, but the claim is that we shouldn’t expect it to work on the first try.
    
    Fortunately , it doesn’t have to. You just need to get it working in AIs that aren’t superintelligent.
- evhub 20 Jun 2021 21:12 UTC
  4 points
  0
  Parent
  There are other models than the discontinuous/fast takeoff model under which alignment of the first advanced AI is critical, e.g. a continuous/slow but homogenous takeoff.

TAG comments on Covid 6/​17: One Last Scare

TAG comments on Covid 6/17: One Last Scare