I expect that if you make a superintelligence it won’t need humans to tell it the best bargaining math it can use
I’m not a fan of idealizing superintelligences. 10+ years ago that was the only way to infer any hard information about worst-case scenarios. Assume perfect play from all sides, and you end up with a fairly narrow game tree that you can reason about. But now it’s a pretty good guess that superintelligences will be more advanced successors of GPT-4 and such. That tells us a lot about the sort of training regimes through which they might learn bargaining, and what sorts of bargaining solutions they might completely unreflectedly employ in specific situations. We can reason about what sorts of training regimes will instill which decision theories in AIs, so why not the same for bargaining.
If we think we can punt the problem to them, then we need to make sure they reflect on how they bargain and the game theoretic implication of that. We may want to train them to seek out gains from trade like it’s useful in a generally cooperative environment, rather than seek out exploits as it would be useful in a more hostile environment.
If we find that we can’t reliably punt the problem to them, we now still have the chance to decide on the right (or a random) bargaining solution and train enough AIs to adopt it (more than 1/3rd? Just particularly prominent projects?) to make it the Schelling point for future AIs. But that window will close when they (OpenAI, DeepMind, vel sim.) finalize the corpus of the training data for the AIs that’ll take over the world.
I don’t care about wars between unaligned AIs, even if they do often have them
Okay. I’m concerned with scenarios where at least one powerful AI is at least as (seemingly) well aligned as GPT-4.
Secondly, you need to assume that the pessimization of the superintelligence’s values would be bad, but in fact I expect it to be just as neutral as the optimization.
Can you rephrase? I don’t follow. It’s probably “pessimization” that throws me off?
why would either of them start the war?
Well, I’m already concerned about finite versions of that. Bad enough to warrant a lot of attention in my mind. But there are different reasons why that could happen. The one that starts the war could’ve made any of a couple different mistakes in assessing their opponent. It could make mistakes in the process of readying its weapons. Finally, the victim of the aggression could make mistakes assessing the aggressor. Naturally, that’s implausible if superintelligences are literally so perfect that they cannot make mistakes ever, but that’s not my starting point. I assume that they’re going to be about as flawed as the NSA, DoD, etc., only in different ways.
I’m not a fan of idealizing superintelligences. 10+ years ago that was the only way to infer any hard information about worst-case scenarios. Assume perfect play from all sides, and you end up with a fairly narrow game tree that you can reason about. But now it’s a pretty good guess that superintelligences will be more advanced successors of GPT-4 and such. That tells us a lot about the sort of training regimes through which they might learn bargaining, and what sorts of bargaining solutions they might completely unreflectedly employ in specific situations. We can reason about what sorts of training regimes will instill which decision theories in AIs, so why not the same for bargaining.
If we think we can punt the problem to them, then we need to make sure they reflect on how they bargain and the game theoretic implication of that. We may want to train them to seek out gains from trade like it’s useful in a generally cooperative environment, rather than seek out exploits as it would be useful in a more hostile environment.
If we find that we can’t reliably punt the problem to them, we now still have the chance to decide on the right (or a random) bargaining solution and train enough AIs to adopt it (more than 1/3rd? Just particularly prominent projects?) to make it the Schelling point for future AIs. But that window will close when they (OpenAI, DeepMind, vel sim.) finalize the corpus of the training data for the AIs that’ll take over the world.
Okay. I’m concerned with scenarios where at least one powerful AI is at least as (seemingly) well aligned as GPT-4.
Can you rephrase? I don’t follow. It’s probably “pessimization” that throws me off?
Well, I’m already concerned about finite versions of that. Bad enough to warrant a lot of attention in my mind. But there are different reasons why that could happen. The one that starts the war could’ve made any of a couple different mistakes in assessing their opponent. It could make mistakes in the process of readying its weapons. Finally, the victim of the aggression could make mistakes assessing the aggressor. Naturally, that’s implausible if superintelligences are literally so perfect that they cannot make mistakes ever, but that’s not my starting point. I assume that they’re going to be about as flawed as the NSA, DoD, etc., only in different ways.