jorio

Karma: 114

MATS 8.0 Scholar

Concept Poisoning: Probing LLMs without probes

Jan Betley, jorio, dylan_f and Owain_Evans

5 Aug 2025 17:00 UTC

60 points

5 comments13 min readLW link

Selective Generalization: Improving Capabilities While Maintaining Alignment

ariana_azarbal, Matthew A. Clarke, jorio, Cailley Factor and cloud

16 Jul 2025 21:25 UTC

67 points

4 comments7 min readLW link