RSS

RobertKirk

Karma: 405

Research Scientist working on Alignment Red Teaming team at the UK AI Security Institute

Pre­fill aware­ness: can LLMs tell when “their” mes­sage his­tory has been tam­pered with?

9 Mar 2026 10:47 UTC
83 points
8 comments10 min readLW link

A Sober Look at Steer­ing Vec­tors for LLMs

23 Nov 2024 17:30 UTC
41 points
0 comments5 min readLW link

Spec­u­la­tive in­fer­ences about path de­pen­dence in LLM su­per­vised fine-tun­ing from re­sults on lin­ear mode con­nec­tivity and model souping

RobertKirk20 Jul 2023 9:56 UTC
39 points
2 comments5 min readLW link