RSS

Dylan Xu

Karma: 156

How do LLMs gen­er­al­ize when we do train­ing that is in­tu­itively com­pat­i­ble with two off-dis­tri­bu­tion be­hav­iors?

20 Apr 2026 16:58 UTC
61 points
4 comments19 min readLW link

Model or­ganisms re­searchers should check whether high LRs defeat their model organisms

10 Apr 2026 0:07 UTC
40 points
0 comments5 min readLW link

dx26′s Shortform

Dylan Xu16 Feb 2025 21:31 UTC
2 points
3 comments1 min readLW link

Mea­sur­ing Co­her­ence and Goal-Direct­ed­ness in RL Policies

Dylan Xu22 Apr 2024 18:26 UTC
10 points
0 comments7 min readLW link

Mea­sur­ing Co­her­ence of Poli­cies in Toy Environments

18 Mar 2024 17:59 UTC
59 points
9 comments14 min readLW link

Su­per­vised Pro­gram for Align­ment Re­search (SPAR) at UC Berkeley: Spring 2023 summary

19 Aug 2023 2:27 UTC
23 points
2 comments6 min readLW link