Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Amirali Abdullah
Karma:
33
All
Posts
Comments
New
Top
Old
Steering Language Models in Multiple Directions Simultaneously
lukemarks
,
Narmeen
and
Amirali Abdullah
2 May 2025 15:27 UTC
18
points
0
comments
7
min read
LW
link
Backdoors have universal representations across large language models
Amirali Abdullah
,
Narmeen
,
Dhruv Nathawani
and
nirmalendu prakash
6 Dec 2024 22:56 UTC
16
points
0
comments
16
min read
LW
link
Early Experiments in Reward Model Interpretation Using Sparse Autoencoders
lukemarks
,
Amirali Abdullah
,
Rauno Arike
,
Fazl
and
nothoughtsheadempty
3 Oct 2023 7:45 UTC
18
points
0
comments
5
min read
LW
link
Back to top