(Self-review.) I started this series to explore my doubts about the “orthodox” case for alignment pessimism. I wrote it as a dialogue and gave my relative non-pessimist character the designated idiot character name to make it clear that I’m just exploring ideas and not staking my reputation on “heresy”. (“Maybe alignment isn’t that hard” doesn’t sound like a smart person’s position—and in fact definitely isn’t a smart person’s for sufficiently ambitious conceptions of what it would mean to “solve the alignment problem.” Simplicia isn’t saying, “Oh, yeah, we’re totally on track to solve philosophy forever in machine-codable form suitable for specifying the values of the superintelligence at the end of time”. As will be explored in part four—forthcoming March 2026—perhaps the disagreement is really about whether some less ambitious alignment target might salvage some cosmic value.)
That said, more than the other entries in this series, this is the one where I’m willing to cop to and put my weight down on Simplicia representing my own views, rather than laundering my doubts as just asking questions.
I understand and agree that there’s a useful analogy between stochastic gradient descent and natural selection, and between future AGI misalignment and humans valuing sex and sweets rather than fitness. To someone who’s never thought about these topics at all, dwelling on the analogy at length is indeed a good use of time. But it’s frustrating how much MIRI’s recent messaging just makes the analogy and then stops there, without considering the huge important disanalogies, like how (as Paul Christiano pointed out in 2022) selective breeding kind-of works and is a better analogical fit to AI (there wasn’t an Evolution Fairy that was trying to make fitness-maximizers; an alien agency trying to selectively breed humans from the EEA would have been able to test hypotheses about how smarter humans would generalize, rather than being taken by surprise by modernity the way an Evolution Fairy would have been), and that deep learning is better thought ofas program synthesis rather than evolving a little animal.
(Self-review.) I started this series to explore my doubts about the “orthodox” case for alignment pessimism. I wrote it as a dialogue and gave my relative non-pessimist character the designated idiot character name to make it clear that I’m just exploring ideas and not staking my reputation on “heresy”. (“Maybe alignment isn’t that hard” doesn’t sound like a smart person’s position—and in fact definitely isn’t a smart person’s for sufficiently ambitious conceptions of what it would mean to “solve the alignment problem.” Simplicia isn’t saying, “Oh, yeah, we’re totally on track to solve philosophy forever in machine-codable form suitable for specifying the values of the superintelligence at the end of time”. As will be explored in part four—forthcoming March 2026—perhaps the disagreement is really about whether some less ambitious alignment target might salvage some cosmic value.)
That said, more than the other entries in this series, this is the one where I’m willing to cop to and put my weight down on Simplicia representing my own views, rather than laundering my doubts as just asking questions.
I understand and agree that there’s a useful analogy between stochastic gradient descent and natural selection, and between future AGI misalignment and humans valuing sex and sweets rather than fitness. To someone who’s never thought about these topics at all, dwelling on the analogy at length is indeed a good use of time. But it’s frustrating how much MIRI’s recent messaging just makes the analogy and then stops there, without considering the huge important disanalogies, like how (as Paul Christiano pointed out in 2022) selective breeding kind-of works and is a better analogical fit to AI (there wasn’t an Evolution Fairy that was trying to make fitness-maximizers; an alien agency trying to selectively breed humans from the EEA would have been able to test hypotheses about how smarter humans would generalize, rather than being taken by surprise by modernity the way an Evolution Fairy would have been), and that deep learning is better thought of as program synthesis rather than evolving a little animal.
Maybe that’s strategically instrumentally rational insofar as MIRI is a propaganda outlet now (in the literal meaning of the word, “public communication aimed at influencing an audience and furthering an agenda”) and doesn’t seem to care that much about being intellectually credible in ways that don’t cash out as policy influence? (It looks like Redwood Research may have picked up the torch.) But it’s disappointing.