On point 35, “Any system of sufficiently intelligent agents can probably behave as a single agent, even if you imagine you’re playing them against each other”:
This claim is somewhat surprising to me given that you’re expecting powerful ML systems to remain very hard to interpret to humans.
I guess the assumption is that superintelligent ML models/systems may not remain uninterpretable to each other, especially not with the strong incentivize to advance interpretability in specific domains/contexts (benefits from cooperation or from making early commitments in commitment races).
Still, if a problem is hard enough, then the fact that strong incentives exist to solve it doesn’t mean it will likely be solved. Having thought a bit about possible avenues to make credible commitments, it feels non-obvious to me whether superintelligent systems will be able to divide up the lightcone, etc. If anyone has more thoughts on the topic, I’d be very interested.
I think this mostly covers the relevant intuitions:
I guess the assumption is that superintelligent ML models/systems may not remain uninterpretable to each other, especially not with the strong incentivize to advance interpretability in specific domains/contexts (benefits from cooperation or from making early commitments in commitment races).
It’s the kind of ‘obvious’ strategy that I think sufficiently ‘smart’ people would use already.
On point 35, “Any system of sufficiently intelligent agents can probably behave as a single agent, even if you imagine you’re playing them against each other”:
This claim is somewhat surprising to me given that you’re expecting powerful ML systems to remain very hard to interpret to humans.
I guess the assumption is that superintelligent ML models/systems may not remain uninterpretable to each other, especially not with the strong incentivize to advance interpretability in specific domains/contexts (benefits from cooperation or from making early commitments in commitment races).
Still, if a problem is hard enough, then the fact that strong incentives exist to solve it doesn’t mean it will likely be solved. Having thought a bit about possible avenues to make credible commitments, it feels non-obvious to me whether superintelligent systems will be able to divide up the lightcone, etc. If anyone has more thoughts on the topic, I’d be very interested.
I think this mostly covers the relevant intuitions:
It’s the kind of ‘obvious’ strategy that I think sufficiently ‘smart’ people would use already.