Dylan Xu

Karma: 156

How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?

Dylan Xu, Alek Westover, Vivek Hebbar, Sebastian Prasanna, frisby, Buck and Julian Stastny

20 Apr 2026 16:58 UTC

61 points

4 comments19 min readLW link

Model organisms researchers should check whether high LRs defeat their model organisms

Dylan Xu, Sebastian Prasanna, Alek Westover, Vivek Hebbar and Julian Stastny

10 Apr 2026 0:07 UTC

40 points

0 comments5 min readLW link

dx26′s Shortform

Dylan Xu16 Feb 2025 21:31 UTC

2 points

3 comments1 min readLW link

Measuring Coherence and Goal-Directedness in RL Policies

Dylan Xu22 Apr 2024 18:26 UTC

10 points

0 comments7 min readLW link

Measuring Coherence of Policies in Toy Environments

Dylan Xu and Richard_Ngo

18 Mar 2024 17:59 UTC

59 points

9 comments14 min readLW link

Supervised Program for Alignment Research (SPAR) at UC Berkeley: Spring 2023 summary

mic, Dylan Xu, adamk and Carolyn Qian

19 Aug 2023 2:27 UTC

23 points

2 comments6 min readLW link