RSS

leogao

Karma: 2,877

Scal­ing Laws for Re­ward Model Overoptimization

20 Oct 2022 0:20 UTC
102 points
13 comments1 min readLW link
(arxiv.org)

Shap­ley Value At­tri­bu­tion in Chain of Thought

leogao14 Apr 2023 5:56 UTC
101 points
5 comments4 min readLW link

Thoughts on the Align­ment Im­pli­ca­tions of Scal­ing Lan­guage Models

leogao2 Jun 2021 21:32 UTC
82 points
11 comments17 min readLW link

Towards de­con­fus­ing wire­head­ing and re­ward maximization

leogao21 Sep 2022 0:36 UTC
81 points
7 comments4 min readLW link

Be­hav­ior Clon­ing is Miscalibrated

leogao5 Dec 2021 1:36 UTC
77 points
3 comments3 min readLW link

Clar­ify­ing wire­head­ing terminology

leogao24 Nov 2022 4:53 UTC
65 points
6 comments1 min readLW link

Gra­di­ent de­scent is not just more effi­cient ge­netic algorithms

leogao8 Sep 2021 16:23 UTC
55 points
14 comments1 min readLW link

Weak-to-Strong Gen­er­al­iza­tion: Elic­it­ing Strong Ca­pa­bil­ities With Weak Supervision

leogao16 Dec 2023 5:39 UTC
53 points
5 comments1 min readLW link

In Defence of Op­ti­miz­ing Rou­tine Tasks

leogao9 Nov 2021 5:09 UTC
47 points
6 comments3 min readLW link1 review

[ASoT] Some thoughts on hu­man abstractions

leogao16 Mar 2023 5:42 UTC
42 points
4 comments5 min readLW link

Quadratic Vot­ing and Collusion

leogao17 Nov 2021 0:19 UTC
41 points
24 comments2 min readLW link

Towards De­con­fus­ing Gra­di­ent Hacking

leogao24 Oct 2021 0:43 UTC
39 points
3 comments12 min readLW link

[ASoT] Con­se­quen­tial­ist mod­els as a su­per­set of mesaoptimizers

leogao23 Apr 2022 17:57 UTC
37 points
2 comments4 min readLW link

[ASoT] Ob­ser­va­tions about ELK

leogao26 Mar 2022 0:42 UTC
31 points
0 comments3 min readLW link

EleutherAI’s GPT-NeoX-20B release

leogao10 Feb 2022 6:56 UTC
30 points
3 comments1 min readLW link
(eaidata.bmk.sh)

NFTs, Coin Col­lect­ing, and Ex­pen­sive Paintings

leogao24 Jan 2022 1:01 UTC
29 points
35 comments5 min readLW link

Ob­sta­cles to gra­di­ent hacking

leogao5 Sep 2021 22:42 UTC
28 points
11 comments4 min readLW link

[Question] How many GPUs does NVIDIA make?

leogao8 Oct 2022 17:54 UTC
27 points
2 comments1 min readLW link

[ASoT] Search­ing for con­se­quen­tial­ist structure

leogao27 Mar 2022 19:09 UTC
26 points
2 comments4 min readLW link

Hu­mans Reflect­ing on HRH

leogao29 Jul 2022 21:56 UTC
26 points
4 comments2 min readLW link