RSS

bilalchughtai

Karma: 1,272

My website is here.

Train­ing on Doc­u­ments About Mon­i­tor­ing Leads To CoT Obfuscation

18 Mar 2026 20:37 UTC
40 points
0 comments16 min readLW link

[Paper] Difficul­ties with Eval­u­at­ing a De­cep­tion De­tec­tor for AIs

3 Dec 2025 20:07 UTC
30 points
2 comments6 min readLW link
(arxiv.org)

How Can In­ter­pretabil­ity Re­searchers Help AGI Go Well?

1 Dec 2025 13:05 UTC
66 points
1 comment14 min readLW link

A Prag­matic Vi­sion for Interpretability

1 Dec 2025 13:05 UTC
131 points
39 comments27 min readLW link

An opinionated guide to build­ing a good to-do system

bilalchughtai5 Aug 2025 23:00 UTC
24 points
7 comments8 min readLW link
(bilalchughtai.co.uk)

De­tect­ing Strate­gic De­cep­tion Us­ing Lin­ear Probes

6 Feb 2025 15:46 UTC
104 points
9 comments2 min readLW link
(arxiv.org)