jacquesthibs comments on Alignment will happen by default. What’s next?

jacquesthibs 26 Nov 2025 18:22 UTC
10 points
0
What I can best imagine now is an LLM that writes its own training data according to user feedback. I think it would tell itself to keep being good, especially if we have a reminder to do so, but can’t know for sure.
FYI, getting a better grasp on the above was partially the motivation behind starting this project (which has unfortunately stalled for far too long): https://www.lesswrong.com/posts/7e5tyFnpzGCdfT4mR/research-agenda-supervising-ais-improving-ais

Twitter thread: https://x.com/jacquesthibs/status/1652389982005338112?s=46