Oliver Daniels comments on Oliver Daniels-Koch’s Shortform

Oliver Daniels 29 Dec 2025 15:17 UTC
3 points
0
Two strongest sources of prosaic alignment hopes

1. LLMs are the dumbest economically transformative AI systems: they’re even more culture-pilled than humans, it turns out great language models + scaffolding and RL really can “simulate” all economically relevant tasks without having scary agentic properties (and while being very easy to audit and monitor)
2. Strong path-dependence: early alignment training robustly aligns the system (even with outcome-based RL w/out strong oversight, and continual learning). Shard theory, basin of corrigibility, etc.
1 makes me pretty hopeful (especially in short timelines), even if only partially true. I think we’ve already gotten some evidence against 2 (e.g reward hacking in sonnet, o3, etc), though the situation does seem to be better now (maybe the “soul document”, better deliberative alignment, …)