Buck comments on We won’t get AIs smart enough to solve alignment but too dumb to rebel

Buck 7 Oct 2025 3:20 UTC
15 points
2
The argument in this post seems to be:
AIs smart enough to help with alignment are capable enough that they’ll realize they are misaligned. Therefore, they will not help with alignment.
When I think about getting misaligned AIs to help with alignment research and other tasks, I’m normally not imagining that the AIs are unaware that they are misaligned. I’m imagining that we can get them to do useful work anyway. See here and here.
You might be interested in the Redwood Research reading list, which contains lots of analyses of these questions and many others.
- Joe Rogero 7 Oct 2025 13:28 UTC
  2 points
  0
  Parent
  To be clear, I do suspect any AI smart enough to solve alignment is also smart enough to escape control and kill us. I’m not planning to go into great detail on control until after a deeper dive on the subject, though. Thanks for the reading material!