RSS

Daniel Tan

Karma: 1,435

Researching AI safety. Currently interested in emergent misalignment, model organisms, and other kinds of empirical work.

https://​​dtch1997.github.io/​​

Inoc­u­la­tion prompt­ing: In­struct­ing mod­els to mis­be­have at train-time can im­prove run-time behavior

8 Oct 2025 22:02 UTC
96 points
14 comments2 min readLW link

Could we have pre­dicted emer­gent mis­al­ign­ment a pri­ori us­ing un­su­per­vised be­havi­our elic­i­ta­tion?

Daniel Tan22 Aug 2025 13:42 UTC
6 points
0 comments1 min readLW link

Open Challenges in Rep­re­sen­ta­tion Engineering

3 Apr 2025 19:21 UTC
14 points
0 comments5 min readLW link

Show, not tell: GPT-4o is more opinionated in images than in text

2 Apr 2025 8:51 UTC
112 points
41 comments3 min readLW link