RSS

RobertKirk(Robert Kirk)

Karma: 274

PhD student at UCL DARK doing RL, OOD Robustness and safety. Interested in self improvement.

Causal con­fu­sion as an ar­gu­ment against the scal­ing hypothesis

20 Jun 2022 10:54 UTC
86 points
30 comments18 min readLW link

Spec­u­la­tive in­fer­ences about path de­pen­dence in LLM su­per­vised fine-tun­ing from re­sults on lin­ear mode con­nec­tivity and model souping

RobertKirk20 Jul 2023 9:56 UTC
38 points
2 comments5 min readLW link

How can In­ter­pretabil­ity help Align­ment?

23 May 2020 16:16 UTC
37 points
3 comments9 min readLW link

What is In­ter­pretabil­ity?

17 Mar 2020 20:23 UTC
35 points
0 comments11 min readLW link