Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
leogao
Karma:
7,909
All
Posts
Comments
New
Top
Old
Page
1
My takes on SB-1047
leogao
9 Sep 2024 18:38 UTC
151
points
8
comments
4
min read
LW
link
Scaling and evaluating sparse autoencoders
leogao
6 Jun 2024 22:50 UTC
112
points
6
comments
1
min read
LW
link
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
leogao
16 Dec 2023 5:39 UTC
55
points
5
comments
1
min read
LW
link
Shapley Value Attribution in Chain of Thought
leogao
14 Apr 2023 5:56 UTC
106
points
7
comments
4
min read
LW
link
[ASoT] Some thoughts on human abstractions
leogao
16 Mar 2023 5:42 UTC
42
points
4
comments
5
min read
LW
link
Clarifying wireheading terminology
leogao
24 Nov 2022 4:53 UTC
67
points
6
comments
1
min read
LW
link
Scaling Laws for Reward Model Overoptimization
leogao
,
John Schulman
and
Jacob_Hilton
20 Oct 2022 0:20 UTC
103
points
13
comments
1
min read
LW
link
(arxiv.org)
[Question]
How many GPUs does NVIDIA make?
leogao
8 Oct 2022 17:54 UTC
27
points
2
comments
1
min read
LW
link
Towards deconfusing wireheading and reward maximization
leogao
21 Sep 2022 0:36 UTC
81
points
7
comments
4
min read
LW
link
Humans Reflecting on HRH
leogao
29 Jul 2022 21:56 UTC
27
points
4
comments
2
min read
LW
link
leogao’s Shortform
leogao
24 May 2022 20:08 UTC
7
points
578
comments
1
min read
LW
link
[ASoT] Consequentialist models as a superset of mesaoptimizers
leogao
23 Apr 2022 17:57 UTC
38
points
2
comments
4
min read
LW
link
[ASoT] Some thoughts about imperfect world modeling
leogao
7 Apr 2022 15:42 UTC
7
points
0
comments
4
min read
LW
link
[ASoT] Some thoughts about LM monologue limitations and ELK
leogao
30 Mar 2022 14:26 UTC
10
points
0
comments
2
min read
LW
link
[ASoT] Some thoughts about deceptive mesaoptimization
leogao
28 Mar 2022 21:14 UTC
24
points
5
comments
7
min read
LW
link
[ASoT] Searching for consequentialist structure
leogao
27 Mar 2022 19:09 UTC
26
points
2
comments
4
min read
LW
link
[ASoT] Some ways ELK could still be solvable in practice
leogao
27 Mar 2022 1:15 UTC
26
points
1
comment
2
min read
LW
link
[ASoT] Observations about ELK
leogao
26 Mar 2022 0:42 UTC
34
points
0
comments
3
min read
LW
link
What do paradigm shifts look like?
leogao
16 Mar 2022 19:17 UTC
18
points
2
comments
1
min read
LW
link
EleutherAI’s GPT-NeoX-20B release
leogao
10 Feb 2022 6:56 UTC
30
points
3
comments
1
min read
LW
link
(eaidata.bmk.sh)
Back to top
Next