leogao

Karma: 13,154

My hobby: running deranged surveys

leogao27 Mar 2026 0:41 UTC

313 points

65 comments9 min readLW link

I’m starting a substack

leogao18 Mar 2026 5:56 UTC

36 points

0 comments1 min readLW link

(nablatheta.substack.com)

An Ambitious Vision for Interpretability

leogao5 Dec 2025 22:57 UTC

175 points

8 comments4 min readLW link

My takes on SB-1047

leogao9 Sep 2024 18:38 UTC

152 points

9 comments4 min readLW link

Scaling and evaluating sparse autoencoders

leogao6 Jun 2024 22:50 UTC

112 points

6 comments1 min readLW link

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

leogao16 Dec 2023 5:39 UTC

56 points

5 comments1 min readLW link

Shapley Value Attribution in Chain of Thought

leogao14 Apr 2023 5:56 UTC

106 points

7 comments4 min readLW link

[ASoT] Some thoughts on human abstractions

leogao16 Mar 2023 5:42 UTC

43 points

4 comments5 min readLW link

Clarifying wireheading terminology

leogao24 Nov 2022 4:53 UTC

68 points

7 comments1 min readLW link

Scaling Laws for Reward Model Overoptimization

leogao, John Schulman and Jacob_Hilton

20 Oct 2022 0:20 UTC

103 points

13 comments1 min readLW link

(arxiv.org)

[Question] How many GPUs does NVIDIA make?

leogao8 Oct 2022 17:54 UTC

27 points

2 comments1 min readLW link

Towards deconfusing wireheading and reward maximization

leogao21 Sep 2022 0:36 UTC

81 points

7 comments4 min readLW link

Humans Reflecting on HRH

leogao29 Jul 2022 21:56 UTC

27 points

4 comments2 min readLW link

leogao’s Shortform

leogao24 May 2022 20:08 UTC

7 points

1,605 comments1 min readLW link

[ASoT] Consequentialist models as a superset of mesaoptimizers

leogao23 Apr 2022 17:57 UTC

38 points

2 comments4 min readLW link

[ASoT] Some thoughts about imperfect world modeling

leogao7 Apr 2022 15:42 UTC

7 points

0 comments4 min readLW link

[ASoT] Some thoughts about LM monologue limitations and ELK

leogao30 Mar 2022 14:26 UTC

10 points

0 comments2 min readLW link

[ASoT] Some thoughts about deceptive mesaoptimization

leogao28 Mar 2022 21:14 UTC

24 points

5 comments7 min readLW link

[ASoT] Searching for consequentialist structure

leogao27 Mar 2022 19:09 UTC

26 points

2 comments4 min readLW link

[ASoT] Some ways ELK could still be solvable in practice

leogao27 Mar 2022 1:15 UTC

26 points

1 comment2 min readLW link