moridinamael comments on GenericModel’s Shortform

moridinamael 28 Dec 2025 19:11 UTC
23 points
3
I probably qualify as one of the people you’re describing.
My reasoning is that we are in the fortunate position of having AI that we can probably ask to do our alignment homework for us. Prior to two or three years ago it seemed implausible that we would get an AI that would:
* care about humans a lot, both by revealed preferences and according to all available interpretability evidence
* be quite smart, smarter than us in many ways, but not yet terrifyingly/dangerously smart

But we have such an AI. Arguably we have more than one such. This is good! We lucked out!
Eliezer has been saying for some time that one of his proposed solutions to the Alignment Problem is to shut down all AI research and to genetically engineer a generation of Von Neumanns to do the hard math and philosophy. This path seems unlikely to happen. However, we almost have a generation of Von Neumanns in our datacenters. I say almost because they are definitely not there yet, but I think, based on an informed awareness of LLM development capabilities and plausible mid-term trajectories, that we will soon have access to arbitrarily many copies of brilliant-but-not-superintelligent friendly AIs who care about human wellbeing, and will be more than adequate partners in the development of AI Alignment theory.
I can foresee many objections and critiques of this perspective. On the highest possible level, I acknowledge that using AI to do our AI Alignment homework carries risks. But I think these risks are clearly more favorable to us than the risks we all thought we would be facing in the late part of the early part of the Singularity. For example, what we don’t have is a broadly generally-capable version of AlphaZero. We have something that landed in just the right part of the intelligence space where it can help us quite a lot and probably not kill us all.
- Eli Tyre 31 Dec 2025 5:55 UTC
  6 points
  0
  Parent
  This is a good statement of an important counterpoint. Thanks for writing it.