I wonder if there’s a really strong outside-view argument that it will be homogenous:
While there are many ways to design flying machines (Balloons, zeppelins, rockets, jets, monoplanes, biplanes, helicopters, …) at any given era and for any particular domain (say, passenger transport, or air superiority) the designs used tend to be pretty similar. (In WW1 almost all the planes were biplanes, and they almost all used slow but light cloth-on-frame construction, in WW2 all the planes were monoplanes with aluminum or other metal skins and more powerful prop engines, the Me109 and Spitfire and Zero were different but in the grand scheme of things very very similar). Moreover this seems to be the norm throughout history, by stark contrast with science fiction where the spaceships, vehicles, etc. of one faction are often wildly different from those of another. Historically, if we want to find cases of wildly different designs competing with each other, we usually need to look to “First contact” scenarios in which e.g. European armies colonize faraway lands. Perhaps it’s just really rare for two dramatically different designs to be almost equally matched in competition, and insofar as they aren’t almost equally matched, people quickly realize this and retire the inferior design.
I guess an important question is: If AIs are homogeneous to the same extent that e.g. military fighter planes are, is that sufficient homogeneity to yield your conclusions 1-4? I think so. I think they’ll probably have the same architecture and training environment, with only minor details different (e.g. the Chinese GPT-N might have access to more chinese data, might have 1.5x the parameter count, might be trained for 0.5x as long) Of course these details will feel like a big deal in competition, just like the Me109 and Spitfire and Zero had various advantages and disadvantages over each other, butfor purposes of coordination, alignment correlation, etc. they are minor.
Key point for those who don’t click through (that I didn’t realize at first) -- both types turned out to work and were in fact used. The gun-type “Little Boy” was dropped on Hiroshima, and the implosion-type “Fat Man” was dropped on Nagasaki.
I think this depends a ton on your reference class. If you compare AI with military fighter planes: very homogenous. If you compare AI with all vehicles: very heterogenous.
Maybe the outside view can be used to say that all AIs designed for a similar purpose will be homogenous, implying that we only get heterogenity in a CAIS scenario, where there are many different specialised designs. But I think the outside view also favors a CAIS scenario over a monolithic AI scenario (though that’s not necessarily decisive).
Yes, but I think we can say something a bit stronger than that: AIs competing with each other will be homogenous. Here’s my current model at least: Let’s say the competition for control of the future involves N skills: Persuasion, science, engineering, …. etc. Even if we suppose that it’s most efficient to design separate AIs for each skill, rather than a smaller number of AIs that have multiple skills each, insofar as there are factions competing for control of the future, they’ll have an AI for each of the skills. They wouldn’t want to leave one of the skills out, or how are they going to compete? So each faction will consist of a group of AIs working together, that collectively has all the relevant skills. And each of the AIs will be designed to be good at the skill it’s assigned, so (via the principle you articulated) each AI will be similar to the other-faction AIs it directly competes with, and the factions as a whole will be pretty similar too, since they’ll be collections of similar AIs. (Compare to militaries: Not only were fighter planes similar, and trucks similar, and battleships similar, the armed forces of Japan, USA, USSR, etc. were similar. By contrast with e.g. the conquistadors vs. the Aztecs, or in sci-fi the Protoss vs. the Zerg, etc.)
I think this is only right if we assume that we’ve solved alignment. Otherwise you might not be able to train a specialised AI that is loyal to your faction.
Here’s how I imagine Evan’s conclusions to fail in a very CAIS-like world:
1. Maybe we can align models that do supervised learning, but can’t align RL, so we’ll have humans+GPT-N competing against a rogue RL-agent that someone created. (And people initially trained both of these because GPT-N makes for a better chatbot, while the RL agent seemed better at making money-maximizing decisions at companies.)
2. A mesa-optimiser arising in GPT-N may be very dissimilar to a money-maximising RL-agent, but they may still end up in conflict. None of them can add an analogue to the other to their team, because they don’t know how to align it.
3. If we use lots of different methods for training lots of different specialised models, any one of them can produce a warning shot (which would ideally make us suspect all other models). Also, they won’t really understand or be able to coordinate with the other systems.
4. It’s not as important if the first advanced AI system is aligned, since there will be lots of different systems of different types. If everyone is training unaligned chatbots, you still care about aligning everyone’s personal assistants.
Thanks! I’m not sure I’m following everything you said, but I like the ideas. Just to be clear, I wasn’t imagining the AIs on the team of a faction to all be aligned necessarily. In fact I was imagining that maybe most (or even all) of them would be narrow AIs / tool AIs for which the concept of alignment doesn’t really apply. Like AlphaFold2. Also, I think the relevant variable for homogeneity isn’t whether we’ve solved alignment—maybe it’s whether the people making AI think they’ve solved alignment. If the Chinese and US militaries think AI risk isn’t a big deal, and build AGI generals to prosecute the cyberwar, they’ll probably use similar designs, even if actually the generals are secretly planning treacherous turns.
In fact I was imagining that maybe most (or even all) of them would be narrow AIs / tool AIs for which the concept of alignment doesn’t really apply.
Ah, yeah, for the purposes of my previous comment I count this as being aligned. If we only have tool AIs (or otherwise alignable AIs), I agree that Evan’s conclusion 2 follow (while the other ones aren’t relevant).
I think the relevant variable for homogeneity isn’t whether we’ve solved alignment—maybe it’s whether the people making AI think they’ve solved alignment
So for homogenity-of-factions, I was specifically trying to say that alignment is necessary to have multiple non-tool AIs on the same faction, because at some point, something must align them all to the faction’s goals.
However, I’m now noticing that this requirement is weaker than what we usually mean with alignment. For our purposes, we want to be able to align AIs to human values. However, for the purpose of building a faction, it’s enough if there exists an AI that can align other AIs to its values, which may be much easier.
Concretely, my best guess is that you need inner alignment, since failure of inner alignment probably produces random goals, which means that multiple inner-misaligned AIs are unlikely to share goals. However, outer alignment is much easier for easily-measurable values than for human values, so I can imagine a world where we fail outer alignment, unthinkingly create an AI that only care about something easy (e.g. maximize money) and then that AI can easily create other AIs that want to help it (with maximizing money).
Concretely, my best guess is that you need inner alignment, since failure of inner alignment probably produces random goals, which means that multiple inner-misaligned AIs are unlikely to share goals.
I disagree with this. I don’t expect a failure of inner alignment to produce random goals, but rather systematically produce goals which are simpler/faster proxies of what we actually want. That is to say, while I expect the goals to look random to us, I don’t actually expect them to differ that much between training runs, since it’s more about your training process’s inductive biases than inherent randomness in the training process in my opinion.
This is helpful, thanks. I’m not sure I agree that for something to count as a faction, the members must be aligned with each other. I think it still counts if the members have wildly different goals but are temporarily collaborating for instrumental reasons, or even if several of the members are secretly working for the other side. For example, in WW2 there were spies on both sides, as well as many people (e.g. most ordinary soldiers) who didn’t really believe in the cause and would happily defect if they could get away with it. Yet the overall structure of the opposing forces was very similar, from the fighter aircraft designs, to the battleship designs, to the relative proportions of fighter planes and battleships, to the way they were integrated into command structure.
I wonder if there’s a really strong outside-view argument that it will be homogenous:
While there are many ways to design flying machines (Balloons, zeppelins, rockets, jets, monoplanes, biplanes, helicopters, …) at any given era and for any particular domain (say, passenger transport, or air superiority) the designs used tend to be pretty similar. (In WW1 almost all the planes were biplanes, and they almost all used slow but light cloth-on-frame construction, in WW2 all the planes were monoplanes with aluminum or other metal skins and more powerful prop engines, the Me109 and Spitfire and Zero were different but in the grand scheme of things very very similar). Moreover this seems to be the norm throughout history, by stark contrast with science fiction where the spaceships, vehicles, etc. of one faction are often wildly different from those of another. Historically, if we want to find cases of wildly different designs competing with each other, we usually need to look to “First contact” scenarios in which e.g. European armies colonize faraway lands. Perhaps it’s just really rare for two dramatically different designs to be almost equally matched in competition, and insofar as they aren’t almost equally matched, people quickly realize this and retire the inferior design.
I guess an important question is: If AIs are homogeneous to the same extent that e.g. military fighter planes are, is that sufficient homogeneity to yield your conclusions 1-4? I think so. I think they’ll probably have the same architecture and training environment, with only minor details different (e.g. the Chinese GPT-N might have access to more chinese data, might have 1.5x the parameter count, might be trained for 0.5x as long) Of course these details will feel like a big deal in competition, just like the Me109 and Spitfire and Zero had various advantages and disadvantages over each other, butfor purposes of coordination, alignment correlation, etc. they are minor.
One counterexample is Manhattan Project—they developed two different designs simultaneously because they weren’t sure which would work better. From wikipedia: Two types of atomic bombs were developed concurrently during the war: a relatively simple gun-type fission weapon and a more complex implosion-type nuclear weapon.
https://en.wikipedia.org/wiki/Manhattan_Project#:~:text=The%20Manhattan%20Project%20was%20a,Tube%20Alloys%20project)%20and%20Canada.
Key point for those who don’t click through (that I didn’t realize at first) -- both types turned out to work and were in fact used. The gun-type “Little Boy” was dropped on Hiroshima, and the implosion-type “Fat Man” was dropped on Nagasaki.
I think this depends a ton on your reference class. If you compare AI with military fighter planes: very homogenous. If you compare AI with all vehicles: very heterogenous.
Maybe the outside view can be used to say that all AIs designed for a similar purpose will be homogenous, implying that we only get heterogenity in a CAIS scenario, where there are many different specialised designs. But I think the outside view also favors a CAIS scenario over a monolithic AI scenario (though that’s not necessarily decisive).
Yes, but I think we can say something a bit stronger than that: AIs competing with each other will be homogenous. Here’s my current model at least: Let’s say the competition for control of the future involves N skills: Persuasion, science, engineering, …. etc. Even if we suppose that it’s most efficient to design separate AIs for each skill, rather than a smaller number of AIs that have multiple skills each, insofar as there are factions competing for control of the future, they’ll have an AI for each of the skills. They wouldn’t want to leave one of the skills out, or how are they going to compete? So each faction will consist of a group of AIs working together, that collectively has all the relevant skills. And each of the AIs will be designed to be good at the skill it’s assigned, so (via the principle you articulated) each AI will be similar to the other-faction AIs it directly competes with, and the factions as a whole will be pretty similar too, since they’ll be collections of similar AIs. (Compare to militaries: Not only were fighter planes similar, and trucks similar, and battleships similar, the armed forces of Japan, USA, USSR, etc. were similar. By contrast with e.g. the conquistadors vs. the Aztecs, or in sci-fi the Protoss vs. the Zerg, etc.)
I think this is only right if we assume that we’ve solved alignment. Otherwise you might not be able to train a specialised AI that is loyal to your faction.
Here’s how I imagine Evan’s conclusions to fail in a very CAIS-like world:
1. Maybe we can align models that do supervised learning, but can’t align RL, so we’ll have humans+GPT-N competing against a rogue RL-agent that someone created. (And people initially trained both of these because GPT-N makes for a better chatbot, while the RL agent seemed better at making money-maximizing decisions at companies.)
2. A mesa-optimiser arising in GPT-N may be very dissimilar to a money-maximising RL-agent, but they may still end up in conflict. None of them can add an analogue to the other to their team, because they don’t know how to align it.
3. If we use lots of different methods for training lots of different specialised models, any one of them can produce a warning shot (which would ideally make us suspect all other models). Also, they won’t really understand or be able to coordinate with the other systems.
4. It’s not as important if the first advanced AI system is aligned, since there will be lots of different systems of different types. If everyone is training unaligned chatbots, you still care about aligning everyone’s personal assistants.
Thanks! I’m not sure I’m following everything you said, but I like the ideas. Just to be clear, I wasn’t imagining the AIs on the team of a faction to all be aligned necessarily. In fact I was imagining that maybe most (or even all) of them would be narrow AIs / tool AIs for which the concept of alignment doesn’t really apply. Like AlphaFold2. Also, I think the relevant variable for homogeneity isn’t whether we’ve solved alignment—maybe it’s whether the people making AI think they’ve solved alignment. If the Chinese and US militaries think AI risk isn’t a big deal, and build AGI generals to prosecute the cyberwar, they’ll probably use similar designs, even if actually the generals are secretly planning treacherous turns.
Ah, yeah, for the purposes of my previous comment I count this as being aligned. If we only have tool AIs (or otherwise alignable AIs), I agree that Evan’s conclusion 2 follow (while the other ones aren’t relevant).
So for homogenity-of-factions, I was specifically trying to say that alignment is necessary to have multiple non-tool AIs on the same faction, because at some point, something must align them all to the faction’s goals.
However, I’m now noticing that this requirement is weaker than what we usually mean with alignment. For our purposes, we want to be able to align AIs to human values. However, for the purpose of building a faction, it’s enough if there exists an AI that can align other AIs to its values, which may be much easier.
Concretely, my best guess is that you need inner alignment, since failure of inner alignment probably produces random goals, which means that multiple inner-misaligned AIs are unlikely to share goals. However, outer alignment is much easier for easily-measurable values than for human values, so I can imagine a world where we fail outer alignment, unthinkingly create an AI that only care about something easy (e.g. maximize money) and then that AI can easily create other AIs that want to help it (with maximizing money).
I disagree with this. I don’t expect a failure of inner alignment to produce random goals, but rather systematically produce goals which are simpler/faster proxies of what we actually want. That is to say, while I expect the goals to look random to us, I don’t actually expect them to differ that much between training runs, since it’s more about your training process’s inductive biases than inherent randomness in the training process in my opinion.
This is helpful, thanks. I’m not sure I agree that for something to count as a faction, the members must be aligned with each other. I think it still counts if the members have wildly different goals but are temporarily collaborating for instrumental reasons, or even if several of the members are secretly working for the other side. For example, in WW2 there were spies on both sides, as well as many people (e.g. most ordinary soldiers) who didn’t really believe in the cause and would happily defect if they could get away with it. Yet the overall structure of the opposing forces was very similar, from the fighter aircraft designs, to the battleship designs, to the relative proportions of fighter planes and battleships, to the way they were integrated into command structure.