RobertKirk(Robert Kirk)

Karma: 274

PhD student at UCL DARK doing RL, OOD Robustness and safety. Interested in self improvement.

Causal confusion as an argument against the scaling hypothesis

RobertKirk and David Scott Krueger (formerly: capybaralet)

20 Jun 2022 10:54 UTC

86 points

30 comments18 min readLW link

Speculative inferences about path dependence in LLM supervised fine-tuning from results on linear mode connectivity and model souping

RobertKirk20 Jul 2023 9:56 UTC

38 points

2 comments5 min readLW link

How can Interpretability help Alignment?

RobertKirk, Tomáš Gavenčiak and axioman

23 May 2020 16:16 UTC

37 points

3 comments9 min readLW link

What is Interpretability?

RobertKirk, Tomáš Gavenčiak and Ada Böhm

17 Mar 2020 20:23 UTC

35 points

0 comments11 min readLW link