Niki Dupuis comments on Alignment pretraining could backfire

Niki Dupuis 18 Jun 2026 12:59 UTC
5 points
0
Since you are pointing at the costs of deliberately deceiving AIs more generally (of which alignment pretraining is potentially an example), I highly recommend these two posts expanding on the costs (to us) of lying to AI systems:
- Hidden Cost of Our Lies to AI
- Being honest with AIs