RSS

Aaryan Chandna

Karma: 30

Is the ev­i­dence in “Lan­guage Models Learn to Mislead Hu­mans via RLHF” valid?

1 Dec 2025 6:50 UTC
37 points
0 comments19 min readLW link