I’ll quote Davidad’s opening statement from the dialogue since I expect most people won’t click through, and seems nice to be basing the discussion off things he actually said.
Somewhere between the capability profile of GPT-4 and the capability profile of Opus 4.5, there seems to have been a phase transition where frontier LLMs have grokked the natural abstraction of what it means to be Good, rather than merely mirroring human values. These observations seem vastly more likely under my old (1999–2012) belief system (which would say that being superhuman in all cognitive domains implies being superhuman at morality) than my newer (2016–2023) belief system (which would say that AlphaZero and systems like it are strong evidence that strategic capabilities and moral capabilities can be decoupled).
My current (2025–2026) belief system says that strategic capabilities can be decoupled from moral capabilities, but that it turns out in practice that the most efficient way to get strategic capabilities involves learning basically all human concepts and “correcting” them (finding more coherent explanations), and this makes the problem of alignment (i.e. making the system actually behave as a Good agent) much much easier than I had thought.
I haven’t found a quote about how confident he is about this. My error bars on “what beliefs would be crazy here?” say that if you were like, 60% confident that this paragraph is true, adding up to “and this makes the problem of alignment much much easier than I had thought” I’m like, I disagree, but, I wouldn’t bet at 20:1 odds against it.
> My current (2025–2026) belief system says that strategic capabilities can be decoupled from moral capabilities, but that it turns out in practice that the most efficient way to get strategic capabilities involves learning basically all human concepts and “correcting” them (finding more coherent explanations)
(Possibly this is addressed somewhere in that dialogue, but anyway:)
Wouldn’t this imply that frontier LLMs are better than humans at ~[(legible) moral philosophy]?
Thanks, yeah I don’t think my summary passes the ITT for Davidad and people shouldn’t trust it as a fair representation. Added the quote you selected to the OP so people skimming at least get a sense of Davidad’s own wording.
I’ll quote Davidad’s opening statement from the dialogue since I expect most people won’t click through, and seems nice to be basing the discussion off things he actually said.
I haven’t found a quote about how confident he is about this. My error bars on “what beliefs would be crazy here?” say that if you were like, 60% confident that this paragraph is true, adding up to “and this makes the problem of alignment much much easier than I had thought” I’m like, I disagree, but, I wouldn’t bet at 20:1 odds against it.
(Possibly this is addressed somewhere in that dialogue, but anyway:)
Wouldn’t this imply that frontier LLMs are better than humans at ~[(legible) moral philosophy]?
Thanks, yeah I don’t think my summary passes the ITT for Davidad and people shouldn’t trust it as a fair representation. Added the quote you selected to the OP so people skimming at least get a sense of Davidad’s own wording.