RSS

Josh Engels

Karma: 468

Prompt­ing Models to Obfus­cate Their CoT

8 Dec 2025 21:00 UTC
15 points
4 comments7 min readLW link

How Can In­ter­pretabil­ity Re­searchers Help AGI Go Well?

1 Dec 2025 13:05 UTC
62 points
1 comment14 min readLW link

A Prag­matic Vi­sion for Interpretability

1 Dec 2025 13:05 UTC
133 points
35 comments27 min readLW link

Cur­rent LLMs seem to rarely de­tect CoT tampering

19 Nov 2025 15:27 UTC
52 points
0 comments20 min readLW link

Nega­tive Re­sults on Group SAEs

Josh Engels6 May 2025 21:49 UTC
71 points
3 comments8 min readLW link

In­terim Re­search Re­port: Mechanisms of Awareness

2 May 2025 20:29 UTC
43 points
6 comments8 min readLW link

Scal­ing Laws for Scal­able Oversight

30 Apr 2025 12:13 UTC
37 points
1 comment9 min readLW link

Josh En­gels’s Shortform

Josh Engels30 Apr 2025 10:58 UTC
4 points
4 comments1 min readLW link

Take­aways From Our Re­cent Work on SAE Probing

3 Mar 2025 19:50 UTC
30 points
4 comments5 min readLW link

SAE Prob­ing: What is it good for?

1 Nov 2024 19:23 UTC
34 points
0 comments11 min readLW link