Kaarel comments on Thomas Larsen’s Shortform

Kaarel 17 Mar 2026 1:49 UTC
7 points
2
they need to solve the alignment problem in a way that humans won’t notice/understand (or else the humans could take the alignment solution and use it for themselves / shutdown the AIs)

their “solution to alignment” (ie way to make a smarter version that is fine to make) could easily be something we cannot use. eg “continue learning” or “make another version of myself with this hyperparam changed”. also it seems unlikely that anything bad would happen to the AIs even if we noticed them doing that (given that having AIs create smarter AIs ^[1] is the main plan of labs anyway)

also on this general topic: https://www.lesswrong.com/posts/CFA8W6WCodEZdjqYE?commentId=WW5syXYpmXdX3yoHw
1. ↩︎
  which is occasionally called “asking AIs to solve alignment”
What links here?
- Kaarel's comment on kh’s Shortform by Kaarel (13 Apr 2026 12:43 UTC; 60 points)