But if you had asked us back then if a superintelligence would automatically be very good at predicting human text outputs, I guarantee we would have said yes. [...] I wish that all of these past conversations were archived to a common place, so that I could search and show you many pieces of text which would talk about this critical divide between prediction and preference (as I would now term it) and how I did in fact expect superintelligences to be able to predict things!
“MIRI’s argument for AI risk depended on AIs being bad at natural language” is a weirdly common misunderstanding, given how often we said the opposite going back 15+ years.
The example does build in the assumption “this outcome pump is bad at NLP”, but this isn’t a load-bearing assumption. If the outcome pump were instead a good conversationalist (or hooked up to one), you would still need to get the right content into its goals.
It’s true that Eliezer and I didn’t predict AI would achieve GPT-3 or GPT-4 levels of NLP ability so early (e.g., before it can match humans in general science ability), so this is an update to some of our models of AI.
But the specific update “AI is good at NLP, therefore alignment is easy” requires that there be an old belief like “a big part of why alignment looks hard is that we’re so bad at NLP”.
It should be easy to find someone at MIRI like Eliezer or Nate saying that in the last 20 years if that was ever a belief here. Absent that, an obvious explanation for why we never just said that is that we didn’t believe it!
Found another example: MIRI’s first technical research agenda, in 2014, went out of its way to clarify that the problem isn’t “AI is bad at NLP”.
Quoting myself in April: