I think you haven’t identified an important crux. I outlined it somewhat in https://www.lesswrong.com/posts/Tnd8xuZukPtAu5X34/you-need-to-solve-the-hard-parts-of-the-alignment-problem. Something like “running one ~human-level AI researcher at ~human speed safely can be done without understanding minds, agency, or how that particular AI system works; but when you run a lot of these systems at many x human speed, if you don’t understand the whole system and why it’s directed at something you’ve defined, it won’t be; there are many dynamics that aren’t important or catastrophically dangerous in one human-level system, but that will make the whole system go off rails if you have this giant system made of many small ones”
I think there’s a possibility that there could be dangerous emergent dynamics from multiple interacting AIs but I’m not too worried about that problem because I don’t think you can increase the capabilities of an AI much simply by running multiple copies of it. You can do more work this way but I don’t think you can get qualitatively much better work.
OpenAI created GPT-4 by training a brand new model not by running multiple copies of GPT-3 together. Similarly, although human corporations can achieve more than a single person, I don’t consider them to be superintelligent. I’d say GPT-4 is more capable and dangerous than 10 copies of GPT-3.
I think there’s more evidence that emergent properties come from within the AI model itself and therefore I’m more worried about bigger models than problems that would occur from running many of them. If we could solve a task using multiple AIs rather than one highly capable AI, I think that would probably be safer and I think that’s part of the idea behind iterated amplification and distillation.
There’s value in running multiple AIs. For example, OpenAI used multiple AIs to summarize books recursively. But even if we don’t run multiple AI models, I think a single AI running at high speed would also be highly valuable. For example, you can paste a long text into GPT-4 today and it will summarize it in less than a minute.
This is a problem with both the Superalignment team and the Conjecture’s approach. If you need one human-level researcher so much, you can simply hire someone!
If you run significantly more than one human-level researcher at 1x human speed, you need to explain why the whole system is aligned. “It’s made out of parts that don’t really want to kill you or wouldn’t succeed at killing you” tells nothing about the goals the whole system might start pursuing
One human-level system can maybe find an important zero-day vulnerability once every couple of months. If there are thousands of these systems, and they work much faster, they can read GitHub, find thousands of zero-day vulnerabilities, and hack literally everything. If one system really wanted to do something, it probably simply wouldn’t be able to, especially if the humans are looking.
If a giant system you don’t fully manually oversee, don’t understand, and don’t control, starts wanting something, it can get what it wants, and there’s no reason why the whole thing will be optimising for anything in the direction of what small parts of it would’ve been fuzzily optimising for, if left to their own devices
I think you haven’t identified an important crux. I outlined it somewhat in https://www.lesswrong.com/posts/Tnd8xuZukPtAu5X34/you-need-to-solve-the-hard-parts-of-the-alignment-problem. Something like “running one ~human-level AI researcher at ~human speed safely can be done without understanding minds, agency, or how that particular AI system works; but when you run a lot of these systems at many x human speed, if you don’t understand the whole system and why it’s directed at something you’ve defined, it won’t be; there are many dynamics that aren’t important or catastrophically dangerous in one human-level system, but that will make the whole system go off rails if you have this giant system made of many small ones”
Thanks for the comment.
I think there’s a possibility that there could be dangerous emergent dynamics from multiple interacting AIs but I’m not too worried about that problem because I don’t think you can increase the capabilities of an AI much simply by running multiple copies of it. You can do more work this way but I don’t think you can get qualitatively much better work.
OpenAI created GPT-4 by training a brand new model not by running multiple copies of GPT-3 together. Similarly, although human corporations can achieve more than a single person, I don’t consider them to be superintelligent. I’d say GPT-4 is more capable and dangerous than 10 copies of GPT-3.
I think there’s more evidence that emergent properties come from within the AI model itself and therefore I’m more worried about bigger models than problems that would occur from running many of them. If we could solve a task using multiple AIs rather than one highly capable AI, I think that would probably be safer and I think that’s part of the idea behind iterated amplification and distillation.
There’s value in running multiple AIs. For example, OpenAI used multiple AIs to summarize books recursively. But even if we don’t run multiple AI models, I think a single AI running at high speed would also be highly valuable. For example, you can paste a long text into GPT-4 today and it will summarize it in less than a minute.
This is a problem with both the Superalignment team and the Conjecture’s approach. If you need one human-level researcher so much, you can simply hire someone!
If you run significantly more than one human-level researcher at 1x human speed, you need to explain why the whole system is aligned. “It’s made out of parts that don’t really want to kill you or wouldn’t succeed at killing you” tells nothing about the goals the whole system might start pursuing
One human-level system can maybe find an important zero-day vulnerability once every couple of months. If there are thousands of these systems, and they work much faster, they can read GitHub, find thousands of zero-day vulnerabilities, and hack literally everything. If one system really wanted to do something, it probably simply wouldn’t be able to, especially if the humans are looking.
If a giant system you don’t fully manually oversee, don’t understand, and don’t control, starts wanting something, it can get what it wants, and there’s no reason why the whole thing will be optimising for anything in the direction of what small parts of it would’ve been fuzzily optimising for, if left to their own devices
Why would this cluster of human level minds start cooperating with each other?