But if they do, we face the problem that most ways of successfully imitating humans don’t look like “build a human (that’s somehow superhumanly good at imitating the Internet)”. They look like “build a relatively complex and alien optimization process that is good at imitation tasks (and potentially at many other tasks)”.
I think this point could use refining. Once we get our predictor AI, we don’t say “do X”, we say “how do you predict a human would do X” and then follow that plan. So you need to argue why plans that an AI predicts humans will use to do X tend to be dangerous. This is clearly a very different set than the set of plans for doing X.
I think this point could use refining. Once we get our predictor AI, we don’t say “do X”, we say “how do you predict a human would do X” and then follow that plan. So you need to argue why plans that an AI predicts humans will use to do X tend to be dangerous. This is clearly a very different set than the set of plans for doing X.