Lukas_Gloor comments on AGI Ruin: A List of Lethalities

Lukas_Gloor 7 Jun 2022 16:22 UTC
8 points
0
On point 35, “Any system of sufficiently intelligent agents can probably behave as a single agent, even if you imagine you’re playing them against each other”:

This claim is somewhat surprising to me given that you’re expecting powerful ML systems to remain very hard to interpret to humans.
I guess the assumption is that superintelligent ML models/systems may not remain uninterpretable to each other, especially not with the strong incentivize to advance interpretability in specific domains/contexts (benefits from cooperation or from making early commitments in commitment races).

Still, if a problem is hard enough, then the fact that strong incentives exist to solve it doesn’t mean it will likely be solved. Having thought a bit about possible avenues to make credible commitments, it feels non-obvious to me whether superintelligent systems will be able to divide up the lightcone, etc. If anyone has more thoughts on the topic, I’d be very interested.
- Kenny 9 Jun 2022 16:56 UTC
  1 point
  0
  Parent
  I think this mostly covers the relevant intuitions:
  
  I guess the assumption is that superintelligent ML models/systems may not remain uninterpretable to each other, especially not with the strong incentivize to advance interpretability in specific domains/contexts (benefits from cooperation or from making early commitments in commitment races).
  
  It’s the kind of ‘obvious’ strategy that I think sufficiently ‘smart’ people would use already.