It will probably be possible, with techniques similar to current ones, to create AIs who are similarly smart and similarly good at working in large teams to my friends, and who are similarly reasonable and benevolent to my friends in the time scale of years under normal conditions.
[...]
This is maybe the most contentious point in my argument, and I agree this is not at all guaranteed to be true, but I have not seen MIRI arguing that it’s overwhelmingly likely to be false.
Did you read the book? Chapter 4, “You Don’t Get What You Train For”, is all about this. I also see reasons to be skeptical, but have you really “not seen MIRI arguing that it’s overwhelmingly likely to be false”?
Yes, I’ve read the book. The book argues about superhuman intelligence though, while point (3) is about smart human level intelligence. If people disagree with point 3 and believe that it’s close to impossible to make even human level AIs basically nice and not scheming, that’s a new interesting and surprising crux.
My vague impression of the authors’ position is approximately that:
AIs are alien and will have different goals-on-reflection than humans
They’ll become powerseeking when they become smart enough and have enough thinking time to realize that they have different goals than humans and that this implies that they ought to take over (if they get a good opportunity.) This is within the human range of smartness.
I’m not sure what the authors think about the argument that you can get the above two properties in a regime where the AI is too dumb to hide its misalignment from you, and that this gives you a great opportunity to iterate and learn from experiment. (Maybe just that the iteration will produce an AI that’s good at hiding its scheming before one that isn’t scheming inclined at all? Or that it’ll produce one that doesn’t scheme in your test cases, but will start scheming once you give it much more time to think on its own, and you can’t afford much testing and iteration on years or decades worth of thinking.)
Aside – I think it’d be nice to have a sequence connecting the various scenes in your play.
Also, I separately think at some point it’d be helpful to have something like a “compressed version of the main takeaways of the play that would have been a helpful textbook from the intermediate future for younger Zack.”
Did you read the book? Chapter 4, “You Don’t Get What You Train For”, is all about this. I also see reasons to be skeptical, but have you really “not seen MIRI arguing that it’s overwhelmingly likely to be false”?
Yes, I’ve read the book. The book argues about superhuman intelligence though, while point (3) is about smart human level intelligence. If people disagree with point 3 and believe that it’s close to impossible to make even human level AIs basically nice and not scheming, that’s a new interesting and surprising crux.
My vague impression of the authors’ position is approximately that:
AIs are alien and will have different goals-on-reflection than humans
They’ll become powerseeking when they become smart enough and have enough thinking time to realize that they have different goals than humans and that this implies that they ought to take over (if they get a good opportunity.) This is within the human range of smartness.
I’m not sure what the authors think about the argument that you can get the above two properties in a regime where the AI is too dumb to hide its misalignment from you, and that this gives you a great opportunity to iterate and learn from experiment. (Maybe just that the iteration will produce an AI that’s good at hiding its scheming before one that isn’t scheming inclined at all? Or that it’ll produce one that doesn’t scheme in your test cases, but will start scheming once you give it much more time to think on its own, and you can’t afford much testing and iteration on years or decades worth of thinking.)
Aside – I think it’d be nice to have a sequence connecting the various scenes in your play.
Also, I separately think at some point it’d be helpful to have something like a “compressed version of the main takeaways of the play that would have been a helpful textbook from the intermediate future for younger Zack.”