RSS

Amirali Abdullah

Karma: 33

Steer­ing Lan­guage Models in Mul­ti­ple Direc­tions Simultaneously

2 May 2025 15:27 UTC
18 points
0 comments7 min readLW link

Back­doors have uni­ver­sal rep­re­sen­ta­tions across large lan­guage models

6 Dec 2024 22:56 UTC
16 points
0 comments16 min readLW link

Early Ex­per­i­ments in Re­ward Model In­ter­pre­ta­tion Us­ing Sparse Autoencoders

3 Oct 2023 7:45 UTC
18 points
0 comments5 min readLW link