RSS

bilalchughtai

Karma: 919

My website is here.

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

6 Feb 2025 15:46 UTC
102 points
9 comments2 min readLW link
(arxiv.org)

Paper: Open Prob­lems in Mechanis­tic Interpretability

29 Jan 2025 10:25 UTC
68 points
0 comments1 min readLW link
(arxiv.org)

Ac­ti­va­tion space in­ter­pretabil­ity may be doomed

8 Jan 2025 12:49 UTC
147 points
32 comments8 min readLW link

Rea­sons for and against work­ing on tech­ni­cal AI safety at a fron­tier AI lab

bilalchughtai5 Jan 2025 14:49 UTC
97 points
12 comments12 min readLW link

Book Sum­mary: Zero to One

bilalchughtai29 Dec 2024 16:13 UTC
27 points
2 comments8 min readLW link