Fwiw, I disagree that the answer to stage 1 and 3 of your quick headline being solved, I think that there are enough unanswered questions there that, enough of them so that we can’t be certain whether or not a multipolar model could potentially hold.
For 1, I agree with the convergence claims but the speed of that convergence is in question. There are fundamental reasons to believe that we get hierarchical agents (e.g this from physics, shard theory). If you have a hierarchical collective agent then a good question is how you get it to maximise and become full consequentialist because it will due to optimality reasons. I think that one of the main ways it smooths out the kinks in its programming is by running into prediction errors and updating from that and then the question becomes how fast it runs into prediction errors. Yet in order to atain prediction errors you need to do some sort of online learning in order to update your beliefs. But the energy cost of that online learning scales pretty badly if you’re doing something like classic life does but with a really large NN. Basically there’s a chance that if you hard scale a network to very high computational power, updating that network increases a lot in energy and so if you want the most bang for your buck you get something more like Comprehensive AI Services since you get a distributed system of more specific learners forming a larger learner.
Then you can ask the question what the difference between the distributed AI and humans Collective Intelligence is. There are arguments that they will just form a super-blob through different forms of trade yet how is that different from what human collective intelligence is? (Looking at this right now!)
Are there forms of collective intelligence that can scale with distributed AI and that can capture AI systems in part of it’s optimality? (E.g group selection due to inherent existing advantages) I do think so and I do think that really strong forms of collective decision making potentially gives you a lot of intelligence. We can then imagine a simple verification contract that an AI gets access to a collective intelligence if it behaves in a certain way, it’s worth it for it because it is a lot easier to access power through yet it also agrees to play by certain rules. I don’t see why this wouldn’t work and I would love for someone to tell me that it doesn’t work!
For 3, why can’t RSI be a collective process given the above arguments around collective versus individual learning? If RSI is a bit like classic science there might also be thresholds and similar at which you get less fast scaling, I feel this is one of the less talked about points in superintelligence, what is the underlying difficulty of RSI at higher levels? From an outside view + black swan perspective it seems very arrogant to believe that to have a linear difficulty scaling?
Some other questions are: What types of knowledge discovery will be needed? What experiments? Where will you get new bits of information from? How will these distribute into the collective memory of the RSI process?
All of these things determine the unipolarity or multipolarity of an RSI process? So we can’t be sure of how it will happen and there’s also probably path dependence based on the best alternative at the initial conditions.
Fwiw, I disagree that the answer to stage 1 and 3 of your quick headline being solved, I think that there are enough unanswered questions there that, enough of them so that we can’t be certain whether or not a multipolar model could potentially hold.
For 1, I agree with the convergence claims but the speed of that convergence is in question. There are fundamental reasons to believe that we get hierarchical agents (e.g this from physics, shard theory). If you have a hierarchical collective agent then a good question is how you get it to maximise and become full consequentialist because it will due to optimality reasons. I think that one of the main ways it smooths out the kinks in its programming is by running into prediction errors and updating from that and then the question becomes how fast it runs into prediction errors. Yet in order to atain prediction errors you need to do some sort of online learning in order to update your beliefs. But the energy cost of that online learning scales pretty badly if you’re doing something like classic life does but with a really large NN. Basically there’s a chance that if you hard scale a network to very high computational power, updating that network increases a lot in energy and so if you want the most bang for your buck you get something more like Comprehensive AI Services since you get a distributed system of more specific learners forming a larger learner.
Then you can ask the question what the difference between the distributed AI and humans Collective Intelligence is. There are arguments that they will just form a super-blob through different forms of trade yet how is that different from what human collective intelligence is? (Looking at this right now!)
Are there forms of collective intelligence that can scale with distributed AI and that can capture AI systems in part of it’s optimality? (E.g group selection due to inherent existing advantages) I do think so and I do think that really strong forms of collective decision making potentially gives you a lot of intelligence. We can then imagine a simple verification contract that an AI gets access to a collective intelligence if it behaves in a certain way, it’s worth it for it because it is a lot easier to access power through yet it also agrees to play by certain rules. I don’t see why this wouldn’t work and I would love for someone to tell me that it doesn’t work!
For 3, why can’t RSI be a collective process given the above arguments around collective versus individual learning? If RSI is a bit like classic science there might also be thresholds and similar at which you get less fast scaling, I feel this is one of the less talked about points in superintelligence, what is the underlying difficulty of RSI at higher levels? From an outside view + black swan perspective it seems very arrogant to believe that to have a linear difficulty scaling?
Some other questions are: What types of knowledge discovery will be needed? What experiments? Where will you get new bits of information from? How will these distribute into the collective memory of the RSI process?
All of these things determine the unipolarity or multipolarity of an RSI process? So we can’t be sure of how it will happen and there’s also probably path dependence based on the best alternative at the initial conditions.