otto.barten comments on Alignment will happen by default. What’s next?

otto.barten 30 Nov 2025 9:19 UTC
1 point
0
One of my main worries is that once AI gets to takeover-level, it will faithfully execute a random goal while no one is reading the CoT/that paradigm won’t have a CoT. As rational part of that random goal, it could take over. That’s strictly speaking not misalignment: it’s perfectly carrying out it’s goal. Still seems a very difficult problem to fix imo.
Have you thought about this?
- Adrià Garriga-alonso 30 Nov 2025 16:51 UTC
  2 points
  0
  Parent
  What do you mean “faithfully execute”?
  
  If you mean that it’s executing its own goal, then I dispute that that will be random, the goal will be to be helpful and good.
  
  Is this different from the standard instrumental convergence algorithm?
  - otto.barten 3 Dec 2025 14:45 UTC
    1 point
    0
    Parent
    I mean, it would be perfectly intent-aligned, it carries out its orders to the letter. Only problem is, carrying out its orders involves a takeover. So no, I don’t mean its own goal, but a goal someone gave it.
    
    I guess it’s a bit different in the sense that instrumental convergence states that all goals will lead to power-seeking subgoals. This statement is less strong, it just says that some goals will lead to power-seeking behaviour.