I am going to interpret this as a piece of genre subversion, where the genre is “20k word allegorical AI alignment dialogue by Eliezer Yudkowsky” and I have to say that it did work on me. I was entirely convinced that this was just another alignment dialogue piece (albeit one with some really confusing plot points) and was somewhat confused as to why you were writing yet another one of those. This meant I was entirely taken aback by the plot elements in the final sections. Touché.
Doesn’t seem like a genre subversion to me, it’s just a bit clever/meta while still centrally being an allegorical AI alignment dialogue. IDK what the target audience is though (but maybe Eliezer just felt inspired to write this).
So far as I can tell, there are still a number of EAs out there who did not get the idea of “the stuff you do with gradient descent does not pin down the thing you want to teach the AI, because it’s a large space and your dataset underspecifies that internal motivation” and who go, “Aha, but you have not considered that by TRAINING the AI we are providing a REASON for the AI to have the internal motivations I want! And have you also considered that gradient descent doesn’t locate a RANDOM element of the space?”
I don’t expect all that much that the primary proponents of this talk can be rescued, but maybe the people they propagandize can be rescued.
It appears that the content of this story under-specifies/mis-specifies your internal motivations when writing it, at least relative to the search space and inductive biases of the learning process that is me.
Ha—I was about to type something very similar. The above comment in particular is also ambiguous—it could be alternatively read as coming from someone who has decided that nothing in the LLM reference class will ever be capable enough to be threatening or “incorrigible” in any meaningful sense because of underspecified inductive biases that promote ad-hoc shortcut learning, and maybe intrinsic limitations to human text as a medium to learn representations from. Someone might argue “Aha but by TRAINING the AI to predict the next token or perform longform tasks we are providing a REASON for the AI to develop coherent, homuncular internal motivations that I FEAR! And have you also considered that gradient descent doesn’t locate a RANDOM element of the space?”
honestly I think this genuinely was the most parsimonious explanation Eliezer could think of (I’m unsure whether planned or post hoc), I reacted with “oh I am now somewhat less confused by this setting”
I am going to interpret this as a piece of genre subversion, where the genre is “20k word allegorical AI alignment dialogue by Eliezer Yudkowsky” and I have to say that it did work on me. I was entirely convinced that this was just another alignment dialogue piece (albeit one with some really confusing plot points) and was somewhat confused as to why you were writing yet another one of those. This meant I was entirely taken aback by the plot elements in the final sections. Touché.
Doesn’t seem like a genre subversion to me, it’s just a bit clever/meta while still centrally being an allegorical AI alignment dialogue. IDK what the target audience is though (but maybe Eliezer just felt inspired to write this).
So far as I can tell, there are still a number of EAs out there who did not get the idea of “the stuff you do with gradient descent does not pin down the thing you want to teach the AI, because it’s a large space and your dataset underspecifies that internal motivation” and who go, “Aha, but you have not considered that by TRAINING the AI we are providing a REASON for the AI to have the internal motivations I want! And have you also considered that gradient descent doesn’t locate a RANDOM element of the space?”
I don’t expect all that much that the primary proponents of this talk can be rescued, but maybe the people they propagandize can be rescued.
It appears that the content of this story under-specifies/mis-specifies your internal motivations when writing it, at least relative to the search space and inductive biases of the learning process that is me.
Ha—I was about to type something very similar. The above comment in particular is also ambiguous—it could be alternatively read as coming from someone who has decided that nothing in the LLM reference class will ever be capable enough to be threatening or “incorrigible” in any meaningful sense because of underspecified inductive biases that promote ad-hoc shortcut learning, and maybe intrinsic limitations to human text as a medium to learn representations from. Someone might argue “Aha but by TRAINING the AI to predict the next token or perform longform tasks we are providing a REASON for the AI to develop coherent, homuncular internal motivations that I FEAR! And have you also considered that gradient descent doesn’t locate a RANDOM element of the space?”
rot13: Gur snvyfnsr jnf gur bayl cneg gung fhecevfrq zr, naq gur qvrtrgvp rkcynangvba sbe ubj guvf frghc pbhyq unir unccrarq ng nyy jnf phgr.
honestly I think this genuinely was the most parsimonious explanation Eliezer could think of (I’m unsure whether planned or post hoc), I reacted with “oh I am now somewhat less confused by this setting”