RSS

bilalchughtai

Karma: 1,067

My website is here.

An opinionated guide to build­ing a good to-do system

bilalchughtai5 Aug 2025 23:00 UTC
23 points
7 comments8 min readLW link
(bilalchughtai.co.uk)

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

6 Feb 2025 15:46 UTC
104 points
9 comments2 min readLW link
(arxiv.org)

Paper: Open Prob­lems in Mechanis­tic Interpretability

29 Jan 2025 10:25 UTC
69 points
0 comments1 min readLW link
(arxiv.org)

Ac­ti­va­tion space in­ter­pretabil­ity may be doomed

8 Jan 2025 12:49 UTC
149 points
35 comments8 min readLW link

Rea­sons for and against work­ing on tech­ni­cal AI safety at a fron­tier AI lab

bilalchughtai5 Jan 2025 14:49 UTC
100 points
12 comments12 min readLW link

Book Sum­mary: Zero to One

bilalchughtai29 Dec 2024 16:13 UTC
27 points
2 comments8 min readLW link

Remap your caps lock key

bilalchughtai15 Dec 2024 14:03 UTC
81 points
21 comments1 min readLW link

You should con­sider ap­ply­ing to PhDs (soon!)

bilalchughtai29 Nov 2024 20:33 UTC
114 points
19 comments6 min readLW link

bilalchugh­tai’s Shortform

bilalchughtai29 Jul 2024 18:57 UTC
5 points
16 commentsLW link

Un­der­stand­ing Po­si­tional Fea­tures in Layer 0 SAEs

29 Jul 2024 9:36 UTC
43 points
0 comments5 min readLW link

Un­learn­ing via RMU is mostly shallow

23 Jul 2024 16:07 UTC
55 points
4 comments6 min readLW link

Trans­former Cir­cuit Faith­ful­ness Met­rics Are Not Robust

12 Jul 2024 3:47 UTC
104 points
5 comments7 min readLW link
(arxiv.org)

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

8 Jul 2024 22:24 UTC
109 points
37 comments5 min readLW link