Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
leogao
Karma:
2,877
All
Posts
Comments
New
Top
Old
Page
1
Scaling Laws for Reward Model Overoptimization
leogao
,
John Schulman
and
Jacob_Hilton
20 Oct 2022 0:20 UTC
102
points
13
comments
1
min read
LW
link
(arxiv.org)
Shapley Value Attribution in Chain of Thought
leogao
14 Apr 2023 5:56 UTC
101
points
5
comments
4
min read
LW
link
Thoughts on the Alignment Implications of Scaling Language Models
leogao
2 Jun 2021 21:32 UTC
82
points
11
comments
17
min read
LW
link
Towards deconfusing wireheading and reward maximization
leogao
21 Sep 2022 0:36 UTC
81
points
7
comments
4
min read
LW
link
Behavior Cloning is Miscalibrated
leogao
5 Dec 2021 1:36 UTC
77
points
3
comments
3
min read
LW
link
Clarifying wireheading terminology
leogao
24 Nov 2022 4:53 UTC
65
points
6
comments
1
min read
LW
link
Gradient descent is not just more efficient genetic algorithms
leogao
8 Sep 2021 16:23 UTC
55
points
14
comments
1
min read
LW
link
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
leogao
16 Dec 2023 5:39 UTC
53
points
5
comments
1
min read
LW
link
In Defence of Optimizing Routine Tasks
leogao
9 Nov 2021 5:09 UTC
47
points
6
comments
3
min read
LW
link
1
review
[ASoT] Some thoughts on human abstractions
leogao
16 Mar 2023 5:42 UTC
42
points
4
comments
5
min read
LW
link
Quadratic Voting and Collusion
leogao
17 Nov 2021 0:19 UTC
41
points
24
comments
2
min read
LW
link
Towards Deconfusing Gradient Hacking
leogao
24 Oct 2021 0:43 UTC
39
points
3
comments
12
min read
LW
link
[ASoT] Consequentialist models as a superset of mesaoptimizers
leogao
23 Apr 2022 17:57 UTC
37
points
2
comments
4
min read
LW
link
[ASoT] Observations about ELK
leogao
26 Mar 2022 0:42 UTC
31
points
0
comments
3
min read
LW
link
EleutherAI’s GPT-NeoX-20B release
leogao
10 Feb 2022 6:56 UTC
30
points
3
comments
1
min read
LW
link
(eaidata.bmk.sh)
NFTs, Coin Collecting, and Expensive Paintings
leogao
24 Jan 2022 1:01 UTC
29
points
35
comments
5
min read
LW
link
Obstacles to gradient hacking
leogao
5 Sep 2021 22:42 UTC
28
points
11
comments
4
min read
LW
link
[Question]
How many GPUs does NVIDIA make?
leogao
8 Oct 2022 17:54 UTC
27
points
2
comments
1
min read
LW
link
[ASoT] Searching for consequentialist structure
leogao
27 Mar 2022 19:09 UTC
26
points
2
comments
4
min read
LW
link
Humans Reflecting on HRH
leogao
29 Jul 2022 21:56 UTC
26
points
4
comments
2
min read
LW
link
Back to top
Next