Daniel Tan

Karma: 2,074

https://dtch1997.github.io/

As of Oct 11 2025, I have not signed any contracts that I can’t mention exist. I’ll try to update this statement at least once a year, so long as it’s true. I added this statement thanks to the one in the gears to ascension’s bio.

Shaping the exploration of the motivation-space matters for AI safety

Maxime Riché, Victor Gillioz, nielsrolf, Kajetan Dymkiewicz, Filip Sondej, RogerDearnaley, Daniel Tan and dillonkn

6 Mar 2026 14:43 UTC

78 points

15 comments10 min readLW link

Concrete research ideas on AI personas

nielsrolf, Maxime Riché and Daniel Tan

3 Feb 2026 21:50 UTC

68 points

10 comments6 min readLW link

A Case for Model Persona Research

nielsrolf, Maxime Riché and Daniel Tan

15 Dec 2025 13:35 UTC

119 points

11 comments4 min readLW link

Understanding and Controlling LLM Generalization

Daniel Tan14 Nov 2025 16:58 UTC

43 points

3 comments1 min readLW link

Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior

Sam Marks, Nevan Wichers, Daniel Tan, Aram Ebtekar, Jozdien, David Africa, Alex Mallen and Fabien Roger

8 Oct 2025 22:02 UTC

176 points

37 comments2 min readLW link

Open Challenges in Representation Engineering

Jan Wehner and Daniel Tan

3 Apr 2025 19:21 UTC

14 points

0 comments5 min readLW link

Show, not tell: GPT-4o is more opinionated in images than in text

Daniel Tan and eggsyntax

2 Apr 2025 8:51 UTC

116 points

42 comments3 min readLW link

Open problems in emergent misalignment

Jan Betley and Daniel Tan

1 Mar 2025 9:47 UTC

86 points

18 comments7 min readLW link

A Collection of Empirical Frames about Language Models

Daniel Tan2 Jan 2025 2:49 UTC

27 points

0 comments3 min readLW link

Why I’m Moving from Mechanistic to Prosaic Interpretability

Daniel Tan30 Dec 2024 6:35 UTC

119 points

34 comments5 min readLW link

A Sober Look at Steering Vectors for LLMs

Joschka Braun, Dmitrii Krasheninnikov, Usman Anwar, RobertKirk, Daniel Tan and David Scott Krueger

23 Nov 2024 17:30 UTC

42 points

0 comments5 min readLW link

Evolutionary prompt optimization for SAE feature visualization

neverix, Daniel Tan, Dmitrii Kharlapenko, Neel Nanda and Arthur Conmy

14 Nov 2024 13:06 UTC

28 points

0 comments9 min readLW link

An Interpretability Illusion from Population Statistics in Causal Analysis

Daniel Tan29 Jul 2024 14:50 UTC

9 points

3 comments1 min readLW link

Daniel Tan’s Shortform

Daniel Tan17 Jul 2024 6:38 UTC

2 points

315 comments1 min readLW link

Mech Interp Lacks Good Paradigms

Daniel Tan16 Jul 2024 15:47 UTC

40 points

0 comments14 min readLW link

Activation Pattern SVD: A proposal for SAE Interpretability

Daniel Tan28 Jun 2024 22:12 UTC

15 points

2 comments2 min readLW link