I’m sorry, you lost me, or maybe we are simply speaking past each other? I am not sure where the human comparison is coming from—the scenario I was concerned with was not an AI beating a human, but an unaligned AI beating an aligned one.
Let me rephrase my question: in the context of the AIs we are building, if there are alignment measures that slow down capabilities a lot (e.g. measures like “if you want a safe AI, stop giving it capabilities until we have solved a number of problems for which we do not even have a clear idea of what a solution would look like),
and alignment measures that do this less (e.g. “if you are giving it more training data to make it more knowledgable and smarter, please make it curated, don’t just dump in 4chan, but reflect on what would be really kick-ass training data from an ethical perspective”, “if you are getting more funding, please earmark 50 % for safety research”, “please encourage humans to be constructive when interacting with AI, via an emotional social media campaign, as well as specific and tangible rewards to constructive interaction, e.g. through permanent performance gains”, “set up a structure where users can easily report and classify non-aligned behaviour for review”, etc.),
and we are really worried that the first superintelligence will be non-aligned by simply overtaking the aligned one,
would it make sense to make a trade-off as to which alignment measures we should drop, and if so, where would that be?
Basically, if the goal is “the first superintelligence should be aligned”, we need to work both on making it aligned, and making it the first one, and should focus on measures that are ideally promoting both, or at least compatible with both, because failing on either is a complete failure. A perfectly aligned but weak AI won’t protect us. A latecoming aligned AI might not find anything left to save; or if the misaligned AI scenario is bad, albeit not as bad as many here fear (so merely dystopian), our aligned AI will still be at a profound disadvantage if it wants to change the power relation.
Which is back to why I did not sign the letter asking for a pause—I think the most responsible actors most likely to keep to it are not the ones I want to win the race.
I’m sorry, you lost me, or maybe we are simply speaking past each other? I am not sure where the human comparison is coming from—the scenario I was concerned with was not an AI beating a human, but an unaligned AI beating an aligned one.
Let me rephrase my question: in the context of the AIs we are building, if there are alignment measures that slow down capabilities a lot (e.g. measures like “if you want a safe AI, stop giving it capabilities until we have solved a number of problems for which we do not even have a clear idea of what a solution would look like),
and alignment measures that do this less (e.g. “if you are giving it more training data to make it more knowledgable and smarter, please make it curated, don’t just dump in 4chan, but reflect on what would be really kick-ass training data from an ethical perspective”, “if you are getting more funding, please earmark 50 % for safety research”, “please encourage humans to be constructive when interacting with AI, via an emotional social media campaign, as well as specific and tangible rewards to constructive interaction, e.g. through permanent performance gains”, “set up a structure where users can easily report and classify non-aligned behaviour for review”, etc.),
and we are really worried that the first superintelligence will be non-aligned by simply overtaking the aligned one,
would it make sense to make a trade-off as to which alignment measures we should drop, and if so, where would that be?
Basically, if the goal is “the first superintelligence should be aligned”, we need to work both on making it aligned, and making it the first one, and should focus on measures that are ideally promoting both, or at least compatible with both, because failing on either is a complete failure. A perfectly aligned but weak AI won’t protect us. A latecoming aligned AI might not find anything left to save; or if the misaligned AI scenario is bad, albeit not as bad as many here fear (so merely dystopian), our aligned AI will still be at a profound disadvantage if it wants to change the power relation.
Which is back to why I did not sign the letter asking for a pause—I think the most responsible actors most likely to keep to it are not the ones I want to win the race.