Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
RobertKirk
(Robert Kirk)
Karma:
274
PhD student at UCL DARK doing RL, OOD Robustness and safety. Interested in self improvement.
All
Posts
Comments
New
Top
Old
Causal confusion as an argument against the scaling hypothesis
RobertKirk
and
David Scott Krueger (formerly: capybaralet)
20 Jun 2022 10:54 UTC
86
points
30
comments
18
min read
LW
link
Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping
RobertKirk
20 Jul 2023 9:56 UTC
38
points
2
comments
5
min read
LW
link
How can Interpretability help Alignment?
RobertKirk
,
Tomáš Gavenčiak
and
axioman
23 May 2020 16:16 UTC
37
points
3
comments
9
min read
LW
link
What is Interpretability?
RobertKirk
,
Tomáš Gavenčiak
and
Ada Böhm
17 Mar 2020 20:23 UTC
35
points
0
comments
11
min read
LW
link
Back to top