Sebastian Prasanna

Karma: 82

An Empirical Study of Methods for SFTing Opaque Reasoning Models

Sebastian Prasanna, Alek Westover, Vivek Hebbar, Dylan Xu and Julian Stastny

24 Apr 2026 17:26 UTC

15 points

0 comments6 min readLW link

How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?

Dylan Xu, Alek Westover, Vivek Hebbar, Sebastian Prasanna, frisby, Buck and Julian Stastny

20 Apr 2026 16:58 UTC

61 points

4 comments19 min readLW link

Five approaches to evaluating training-based control measures

Alek Westover, Sebastian Prasanna, Julian Stastny and Vivek Hebbar

18 Apr 2026 1:07 UTC

19 points

0 comments6 min readLW link

Model organisms researchers should check whether high LRs defeat their model organisms

Dylan Xu, Sebastian Prasanna, Alek Westover, Vivek Hebbar and Julian Stastny

10 Apr 2026 0:07 UTC

40 points

0 comments5 min readLW link

Basic Legibility Protocols Improve Trusted Monitoring

Sebastian Prasanna and theashwinner

12 Feb 2026 17:50 UTC

8 points

0 comments11 min readLW link