ariana_azarbal

Karma: 653

Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask, Twm Stone, Josh Hills, Ida Caspary and Shubhorup Biswas

10 Jun 2026 17:58 UTC

250 points

20 comments4 min readLW link

Confusion around the term reward hacking

ariana_azarbal20 Mar 2026 16:13 UTC

60 points

6 comments5 min readLW link

Recontextualization Mitigates Specification Gaming Without Modifying the Specification

ariana_azarbal, Victor Gillioz, TurnTrout and cloud

14 Oct 2025 0:53 UTC

144 points

15 comments10 min readLW link

Training a Reward Hacker Despite Perfect Labels

ariana_azarbal, Victor Gillioz and TurnTrout

14 Aug 2025 23:57 UTC

141 points

47 comments4 min readLW link

Selective Generalization: Improving Capabilities While Maintaining Alignment

ariana_azarbal, Matthew A. Clarke, Jorio Cocola, Cailley Factor and cloud

16 Jul 2025 21:25 UTC

82 points

6 comments7 min readLW link