cousin_it comments on Alignment will happen by default. What’s next?

cousin_it 25 Nov 2025 10:00 UTC
7 points
0
Yeah, exactly this. When people get a lot of power, they very often start treating below them worse. So AIs that are trained on imitating people might also turn out like that. On top of that, I expect companies to tweak their AIs in ways that optimize for money, and this can also go bad when AIs get powerful. So we probably need AIs that are more moral than most people, and trained by organizations that don’t have a money or power motive.
- Adrià Garriga-alonso 25 Nov 2025 20:00 UTC
  2 points
  0
  Parent
  But if the person is moral and gets more and more competent, they’ll try hard to stay moral. If the AIs are indeed already good people (and we remind them of this problem) then they’d steer their own future selves towards more morality. This is the ‘alignment basin’ take.
  - cousin_it 26 Nov 2025 14:45 UTC
    12 points
    0
    Parent
    I don’t believe that AI companies today are trying to build moral AIs. An actually moral AI, when asked to generate some slop to gunk up the internet, would say no. So it would not be profitable for the company. This refutes the “alignment basin” argument for me. Maybe the basin exists, but AI companies aren’t aiming there.
    
    Ok, never mind alignment, how about “corrigibility basin”? What does a corrigible AI do if one person asks it to harm another, and the other person asks not to be harmed? Does the AI obey the person who has the corrigibility USB stick? I can see AI companies aiming for that, but that doesn’t help the rest of us.