“I don’t think it’s hard, as a question of computer science, to get an AI to prioritize the goals you intended.”
MIRI has worked on that problem for two decades, and failed to solve it. I am shocked that someone who says they are a research scientist on the Scalable Alignment team at Google DeepMind could so cavalierly and naïvely dismiss the difficulty of the alignment problem.
My inside-view perspective: MIRI failed in part because they’re wrong and philosophically confused. They made incorrect assumptions about the problem, and so of course they failed.
I don’t think “perfect” is a good descriptor for the missing solution. The solutions we have lack (at least) two crucial features: 1. A way to get an AI to prioritize the intended goals, with high enough fidelity to work when AI is no longer extremely corrigible, as today’s AIs are (because they’re not capable enough to circumvent human methods of control). 2. A way that works far enough outside of the training set. E.g., when AI is substantially in charge of logistics, research and development, security, etc.; and is doing those things in novel ways.
“I don’t think it’s hard, as a question of computer science, to get an AI to prioritize the goals you intended.”
MIRI has worked on that problem for two decades, and failed to solve it. I am shocked that someone who says they are a research scientist on the Scalable Alignment team at Google DeepMind could so cavalierly and naïvely dismiss the difficulty of the alignment problem.
My inside-view perspective: MIRI failed in part because they’re wrong and philosophically confused. They made incorrect assumptions about the problem, and so of course they failed.
I did my PhD in this field and have authored dozens of posts about my beliefs, critiques, and proposals. Specifically, many posts are about my disagreements with MIRI/EY, like Inner and Outer Alignment Decompose One Hard Problem Into Two Extremely Hard Problems (voted into the top 10 of the LessWrong review for that year), Many Arguments for AI X-Risk Are Wrong, or Some of My Disagreements with List of Lethalities. You might disagree with me, but I am not naive in my experience or cavalier in coming to this conclusion.
Adequate AI agents exist, so the problem is soluble at a good enough-level. What is lacking is presumably a perfect solution
I don’t think “perfect” is a good descriptor for the missing solution. The solutions we have lack (at least) two crucial features:
1. A way to get an AI to prioritize the intended goals, with high enough fidelity to work when AI is no longer extremely corrigible, as today’s AIs are (because they’re not capable enough to circumvent human methods of control).
2. A way that works far enough outside of the training set. E.g., when AI is substantially in charge of logistics, research and development, security, etc.; and is doing those things in novel ways.