Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Dylan Xu
Karma:
156
All
Posts
Comments
New
Top
Old
How do LLMs generalize when we do training that is intuitively compatible with two off-distribution behaviors?
Dylan Xu
,
Alek Westover
,
Vivek Hebbar
,
Sebastian Prasanna
,
frisby
,
Buck
and
Julian Stastny
20 Apr 2026 16:58 UTC
61
points
4
comments
19
min read
LW
link
Model organisms researchers should check whether high LRs defeat their model organisms
Dylan Xu
,
Sebastian Prasanna
,
Alek Westover
,
Vivek Hebbar
and
Julian Stastny
10 Apr 2026 0:07 UTC
40
points
0
comments
5
min read
LW
link
dx26′s Shortform
Dylan Xu
16 Feb 2025 21:31 UTC
2
points
3
comments
1
min read
LW
link
Measuring Coherence and Goal-Directedness in RL Policies
Dylan Xu
22 Apr 2024 18:26 UTC
10
points
0
comments
7
min read
LW
link
Measuring Coherence of Policies in Toy Environments
Dylan Xu
and
Richard_Ngo
18 Mar 2024 17:59 UTC
59
points
9
comments
14
min read
LW
link
Supervised Program for Alignment Research (SPAR) at UC Berkeley: Spring 2023 summary
mic
,
Dylan Xu
,
adamk
and
Carolyn Qian
19 Aug 2023 2:27 UTC
23
points
2
comments
6
min read
LW
link
Back to top