the values we want are a very narrow target and we currently have no solid idea how to do alignment, so when AI does take over everything we’re probly going to die. or worse, if for example we botch alignment.
We can build altruistic AGI without learning human values at all—as AGI can optimize for human empowerment (our ability to fulfill all long term goals).[1]
if aligning current ML models is impossible or would take 50 years, and if aligning something different could take as little 5 years, then we need to align something else.
The foundational formal approach has already been pursued for over 20 years, and has show very little signs of progress. On the contrary, it’s main original founder/advocate seems to have given up, declaring doom. What makes you think it could succeed in as little as 5 years? Updating on the success of DL, what makes you think that DL based alignment would take 50?
We can build altruistic AGI without learning human values at all—as AGI can optimize for human empowerment (our ability to fulfill all long term goals).[1]
The foundational formal approach has already been pursued for over 20 years, and has show very little signs of progress. On the contrary, it’s main original founder/advocate seems to have given up, declaring doom. What makes you think it could succeed in as little as 5 years? Updating on the success of DL, what makes you think that DL based alignment would take 50?
Franzmeyer, Tim, Mateusz Malinowski, and João F. Henriques. “Learning Altruistic Behaviours in Reinforcement Learning without External Rewards.” arXiv preprint arXiv:2107.09598 (2021)
Human empowerment is a really narrow target too