Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
RobertKirk
(Robert Kirk)
Karma:
274
PhD student at UCL DARK doing RL, OOD Robustness and safety. Interested in self improvement.
All
Posts
Comments
New
Top
Old
What is Interpretability?
RobertKirk
,
Tomáš Gavenčiak
and
Ada Böhm
17 Mar 2020 20:23 UTC
35
points
0
comments
11
min read
LW
link
How can Interpretability help Alignment?
RobertKirk
,
Tomáš Gavenčiak
and
axioman
23 May 2020 16:16 UTC
37
points
3
comments
9
min read
LW
link
Causal confusion as an argument against the scaling hypothesis
RobertKirk
and
David Scott Krueger (formerly: capybaralet)
20 Jun 2022 10:54 UTC
86
points
30
comments
18
min read
LW
link
Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping
RobertKirk
20 Jul 2023 9:56 UTC
38
points
2
comments
5
min read
LW
link
Back to top