RSS

ariana_azarbal

Karma: 195

Train­ing a Re­ward Hacker De­spite Perfect Labels

14 Aug 2025 23:57 UTC
127 points
45 comments4 min readLW link

Selec­tive Gen­er­al­iza­tion: Im­prov­ing Ca­pa­bil­ities While Main­tain­ing Alignment

16 Jul 2025 21:25 UTC
65 points
4 comments7 min readLW link