Two comments on your model, both leading to the same conclusion that even if Google had an AI Ethics panel with good recommendations, teams that might produce AGI would not implement them:
Not being stopped
To prevent a bad end, the first aligned AGI must also prevent the creation of any future possibly-unaligned (or differently aligned) AGIs. This is implied by some formulations of AGI (e.g. converging instrumental goals) but it’s good to state it explicitly. Let’s call this “AGI conquers the world”.
A team good enough to build an aligned AGI, is likely also good enough to foresee that it might conquer the world. Once this becomes known, in a company the size of Google, enough people would strongly disapprove that it would be leaked outside the company, and then the project would be shut down or taken over by external forces.
Building an AGI that might take over the world can only work if it’s kept very secret—harder in a company like Google than in a small company or in a secretive government agency—or if noone outside the project believes it can succeed. In either case, an AI ethics committee wouldn’t intervene.
Value alignment
Suppose that the AI Alignment problem is solved. The solution is public, easy to implement, and proven correct. However, the values to align to still need to be chosen; they are independent of the solution.
There will still be teams competing to build the first AGI. Each team will align its AGI with its own values. (Ignore for the moment disagreements between team members.) But what does that mean in practice—who gets to choose these values? The programmers? Middle management? The CEO? The President? The AI Ethics committee?
Any public attempt to agree on values will generate a huge, unsolvable political storm. Faced with a chance to control our future lightcone, humans will argue about democracy, religion, and the unfair firing of Dr Gebru. Meanwhile, anyone who thinks they can influence the values in their favor will have the biggest imaginable incentive to do so. This includes, of course, use of force and potential sabotage of the project.
(For sabotage, read nuclear first strike. If you think that’s unlikely, consider how e.g. the US military might react if they truly believe a Chinese company is about to develop a singleton AGI. Or how Israel might react if they believe that about Iran.)
Therefore, any team who thinks they are building an AGI and realises the implications will do their best to stay secret. Which means not revealing yourself to the Google AI Ethics panel. They might still implement ethics or alignment, but they would be equally likely to implement things published outside Google.
Meanwhile, any team that doesn’t realize they are building an AGI, will probably not be able to make it aligned, even with the best ethics advice. They will take the perfect solution to alignment and use it with a value like “maximize user engagement with our ads (over the future lightcone of humanity)”.
Two comments on your model, both leading to the same conclusion that even if Google had an AI Ethics panel with good recommendations, teams that might produce AGI would not implement them:
Not being stopped
To prevent a bad end, the first aligned AGI must also prevent the creation of any future possibly-unaligned (or differently aligned) AGIs. This is implied by some formulations of AGI (e.g. converging instrumental goals) but it’s good to state it explicitly. Let’s call this “AGI conquers the world”.
A team good enough to build an aligned AGI, is likely also good enough to foresee that it might conquer the world. Once this becomes known, in a company the size of Google, enough people would strongly disapprove that it would be leaked outside the company, and then the project would be shut down or taken over by external forces.
Building an AGI that might take over the world can only work if it’s kept very secret—harder in a company like Google than in a small company or in a secretive government agency—or if noone outside the project believes it can succeed. In either case, an AI ethics committee wouldn’t intervene.
Value alignment
Suppose that the AI Alignment problem is solved. The solution is public, easy to implement, and proven correct. However, the values to align to still need to be chosen; they are independent of the solution.
There will still be teams competing to build the first AGI. Each team will align its AGI with its own values. (Ignore for the moment disagreements between team members.) But what does that mean in practice—who gets to choose these values? The programmers? Middle management? The CEO? The President? The AI Ethics committee?
Any public attempt to agree on values will generate a huge, unsolvable political storm. Faced with a chance to control our future lightcone, humans will argue about democracy, religion, and the unfair firing of Dr Gebru. Meanwhile, anyone who thinks they can influence the values in their favor will have the biggest imaginable incentive to do so. This includes, of course, use of force and potential sabotage of the project.
(For sabotage, read nuclear first strike. If you think that’s unlikely, consider how e.g. the US military might react if they truly believe a Chinese company is about to develop a singleton AGI. Or how Israel might react if they believe that about Iran.)
Therefore, any team who thinks they are building an AGI and realises the implications will do their best to stay secret. Which means not revealing yourself to the Google AI Ethics panel. They might still implement ethics or alignment, but they would be equally likely to implement things published outside Google.
Meanwhile, any team that doesn’t realize they are building an AGI, will probably not be able to make it aligned, even with the best ethics advice. They will take the perfect solution to alignment and use it with a value like “maximize user engagement with our ads (over the future lightcone of humanity)”.