RSS

Aaryan Chandna

Karma: 28

Is the ev­i­dence in “Lan­guage Models Learn to Mislead Hu­mans via RLHF” valid?

1 Dec 2025 6:50 UTC
35 points
0 comments19 min readLW link