RobertKirk(Robert Kirk)

Karma: 274

PhD student at UCL DARK doing RL, OOD Robustness and safety. Interested in self improvement.

What is Interpretability?

RobertKirk, Tomáš Gavenčiak and Ada Böhm

17 Mar 2020 20:23 UTC

35 points

0 comments11 min readLW link

How can Interpretability help Alignment?

RobertKirk, Tomáš Gavenčiak and axioman

23 May 2020 16:16 UTC

37 points

3 comments9 min readLW link

Causal confusion as an argument against the scaling hypothesis

RobertKirk and David Scott Krueger (formerly: capybaralet)

20 Jun 2022 10:54 UTC

86 points

30 comments18 min readLW link

Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping

RobertKirk20 Jul 2023 9:56 UTC

38 points

2 comments5 min readLW link