This post evolved from a Twitter thread I wrote two weeks ago. Copying over a Twitter reply by Richard Ngo (n.b. Richard was replying to the version on Twitter, which differed in lots of ways):
Rob, I appreciate your efforts, but this is a terrible framing for trying to convey “the basics”, and obscures way more than it clarifies.
I’m worried about agents which try to achieve goals. That’s the core thing, and you’re calling it a misconception?! That’s blatantly false.
In my first Alignment Fundamentals class I too tried to convey all the nuances of my thinking about agency as “the basics”. It failed badly. One lesson: communication is harder than you think. More importantly: “actually trying” means getting feedback from your target audience.
(I replied and we had a short back-and-forth on Twitter.)
I definitely agree with Richard that the post would probably benefit from more iteration with intended users, if new people are the audience you want to target. (In particular, I doubt that the section quoted from the Aryeh interview will clarify much for new people.)
That said, I definitely think that it’s the right call to emphasize up-front that instrumental convergence is a property of problem-space rather than of agency. More generally: when there’s a common misinterpretation, which very often ends up load-bearing, then it makes sense to address that upfront; that’s not nuance, it’s central. Nuance is addressing misinterpretations which are rare or not very load-bearing. Instrumental convergence being a property of problem-spaces rather than “agents” is pretty central to a MIRI-ish view, and underlies a lot of common confusions new-ish people have about such views.
Thanks for the feedback, John! I’ve moved the Aryeh/Eliezer exchange to a footnote, and I welcome more ideas for ways to improve the piece. (Folks are also welcome to repurpose anything I wrote above to create something new and more beginner-friendly, if you think there’s a germ of a good beginner-friendly piece anywhere in the OP.)
Also, per footnote 1: “I wrote this post to summarize my own top reasons for being worried, not to try to make a maximally compelling or digestible case for others.”
The original reason I wrote this was that Dustin Moskovitz wanted something like this, as an alternative to posts like AGI Ruin:
[H]ave you tried making a layman’s explanation of the case? Do you endorse the summary? I’m aware of much longer versions of the argument, but not shorter ones!
From my POV, a lot of the confusion is around the confidence level. Historically EY makes many arguments to express his confidence, and that makes people feel snowed, like they have to inspect each one. I think it’d be better if there was more clarity about which are strongest.
I think one argument is about the number of relatively independent issues, and that’s still valid, but then you could link out to that list as a separate exercise without losing everyone.
This post is speaking for me and not necessarily for Eliezer, but I figure it may be useful anyway. (A MIRI researcher did review an earlier draft and left comments that I incorporated, at least.)
And indeed, one of the obvious ways it could be useful is if it ends up evolving into (or inspiring) a good introductory resource, though I don’t know how likely that is, I don’t know whether it’s already a good intro-ish resource paired with something else, etc.
This post evolved from a Twitter thread I wrote two weeks ago. Copying over a Twitter reply by Richard Ngo (n.b. Richard was replying to the version on Twitter, which differed in lots of ways):
(I replied and we had a short back-and-forth on Twitter.)
I definitely agree with Richard that the post would probably benefit from more iteration with intended users, if new people are the audience you want to target. (In particular, I doubt that the section quoted from the Aryeh interview will clarify much for new people.)
That said, I definitely think that it’s the right call to emphasize up-front that instrumental convergence is a property of problem-space rather than of agency. More generally: when there’s a common misinterpretation, which very often ends up load-bearing, then it makes sense to address that upfront; that’s not nuance, it’s central. Nuance is addressing misinterpretations which are rare or not very load-bearing. Instrumental convergence being a property of problem-spaces rather than “agents” is pretty central to a MIRI-ish view, and underlies a lot of common confusions new-ish people have about such views.
Thanks for the feedback, John! I’ve moved the Aryeh/Eliezer exchange to a footnote, and I welcome more ideas for ways to improve the piece. (Folks are also welcome to repurpose anything I wrote above to create something new and more beginner-friendly, if you think there’s a germ of a good beginner-friendly piece anywhere in the OP.)
Tagging @Richard_Ngo
Also, per footnote 1: “I wrote this post to summarize my own top reasons for being worried, not to try to make a maximally compelling or digestible case for others.”
The original reason I wrote this was that Dustin Moskovitz wanted something like this, as an alternative to posts like AGI Ruin:
This post is speaking for me and not necessarily for Eliezer, but I figure it may be useful anyway. (A MIRI researcher did review an earlier draft and left comments that I incorporated, at least.)
And indeed, one of the obvious ways it could be useful is if it ends up evolving into (or inspiring) a good introductory resource, though I don’t know how likely that is, I don’t know whether it’s already a good intro-ish resource paired with something else, etc.