leogao

Karma: 2,877

Scaling Laws for Reward Model Overoptimization

leogao, John Schulman and Jacob_Hilton

20 Oct 2022 0:20 UTC

102 points

13 comments1 min readLW link

(arxiv.org)

Shapley Value Attribution in Chain of Thought

leogao14 Apr 2023 5:56 UTC

101 points

5 comments4 min readLW link

Thoughts on the Alignment Implications of Scaling Language Models

leogao2 Jun 2021 21:32 UTC

82 points

11 comments17 min readLW link

Towards deconfusing wireheading and reward maximization

leogao21 Sep 2022 0:36 UTC

81 points

7 comments4 min readLW link

Behavior Cloning is Miscalibrated

leogao5 Dec 2021 1:36 UTC

77 points

3 comments3 min readLW link

Clarifying wireheading terminology

leogao24 Nov 2022 4:53 UTC

65 points

6 comments1 min readLW link

Gradient descent is not just more efficient genetic algorithms

leogao8 Sep 2021 16:23 UTC

55 points

14 comments1 min readLW link

Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

leogao16 Dec 2023 5:39 UTC

53 points

5 comments1 min readLW link

In Defence of Optimizing Routine Tasks

leogao9 Nov 2021 5:09 UTC

47 points

6 comments3 min readLW link 1 review

[ASoT] Some thoughts on human abstractions

leogao16 Mar 2023 5:42 UTC

42 points

4 comments5 min readLW link

Quadratic Voting and Collusion

leogao17 Nov 2021 0:19 UTC

41 points

24 comments2 min readLW link

Towards Deconfusing Gradient Hacking

leogao24 Oct 2021 0:43 UTC

39 points

3 comments12 min readLW link

[ASoT] Consequentialist models as a superset of mesaoptimizers

leogao23 Apr 2022 17:57 UTC

37 points

2 comments4 min readLW link

[ASoT] Observations about ELK

leogao26 Mar 2022 0:42 UTC

31 points

0 comments3 min readLW link

EleutherAI’s GPT-NeoX-20B release

leogao10 Feb 2022 6:56 UTC

30 points

3 comments1 min readLW link

(eaidata.bmk.sh)

NFTs, Coin Collecting, and Expensive Paintings

leogao24 Jan 2022 1:01 UTC

29 points

35 comments5 min readLW link

Obstacles to gradient hacking

leogao5 Sep 2021 22:42 UTC

28 points

11 comments4 min readLW link

[Question] How many GPUs does NVIDIA make?

leogao8 Oct 2022 17:54 UTC

27 points

2 comments1 min readLW link

[ASoT] Searching for consequentialist structure

leogao27 Mar 2022 19:09 UTC

26 points

2 comments4 min readLW link

Humans Reflecting on HRH

leogao29 Jul 2022 21:56 UTC

26 points

4 comments2 min readLW link