Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Stuart_Armstrong
(Stuart Armstrong)
Karma:
17,171
All
Posts
Comments
New
Top
Old
Page
1
Large language models can provide “normative assumptions” for learning human preferences
Stuart_Armstrong
2 Jan 2023 19:39 UTC
29
points
12
comments
3
min read
LW
link
Concept extrapolation for hypothesis generation
Stuart_Armstrong
,
patrickleask
and
rgorman
12 Dec 2022 22:09 UTC
20
points
2
comments
3
min read
LW
link
Using GPT-Eliezer against ChatGPT Jailbreaking
Stuart_Armstrong
and
rgorman
6 Dec 2022 19:54 UTC
167
points
77
comments
9
min read
LW
link
Benchmark for successful concept extrapolation/avoiding goal misgeneralization
Stuart_Armstrong
4 Jul 2022 20:48 UTC
80
points
12
comments
4
min read
LW
link
Value extrapolation vs Wireheading
Stuart_Armstrong
17 Jun 2022 15:02 UTC
16
points
1
comment
1
min read
LW
link
Georgism, in theory
Stuart_Armstrong
15 Jun 2022 15:20 UTC
38
points
21
comments
4
min read
LW
link
How to get into AI safety research
Stuart_Armstrong
18 May 2022 18:05 UTC
44
points
7
comments
1
min read
LW
link
GPT-3 and concept extrapolation
Stuart_Armstrong
20 Apr 2022 10:39 UTC
19
points
28
comments
1
min read
LW
link
Concept extrapolation: key posts
Stuart_Armstrong
19 Apr 2022 10:01 UTC
12
points
2
comments
1
min read
LW
link
AIs should learn human preferences, not biases
Stuart_Armstrong
8 Apr 2022 13:45 UTC
10
points
1
comment
1
min read
LW
link
Different perspectives on concept extrapolation
Stuart_Armstrong
8 Apr 2022 10:42 UTC
43
points
7
comments
5
min read
LW
link
Value extrapolation, concept extrapolation, model splintering
Stuart_Armstrong
8 Mar 2022 22:50 UTC
14
points
1
comment
2
min read
LW
link
[Link] Aligned AI AMA
Stuart_Armstrong
1 Mar 2022 12:01 UTC
18
points
0
comments
1
min read
LW
link
More GPT-3 and symbol grounding
Stuart_Armstrong
23 Feb 2022 18:30 UTC
21
points
7
comments
3
min read
LW
link
Why I’m co-founding Aligned AI
Stuart_Armstrong
17 Feb 2022 19:55 UTC
93
points
54
comments
3
min read
LW
link
Different way classifiers can be diverse
Stuart_Armstrong
17 Jan 2022 16:30 UTC
10
points
5
comments
2
min read
LW
link
Value extrapolation partially resolves symbol grounding
Stuart_Armstrong
12 Jan 2022 16:30 UTC
24
points
10
comments
1
min read
LW
link
How an alien theory of mind might be unlearnable
Stuart_Armstrong
3 Jan 2022 11:16 UTC
26
points
35
comments
5
min read
LW
link
Finding the multiple ground truths of CoinRun and image classification
Stuart_Armstrong
8 Dec 2021 18:13 UTC
15
points
3
comments
2
min read
LW
link
Declustering, reclustering, and filling in thingspace
Stuart_Armstrong
6 Dec 2021 20:53 UTC
16
points
6
comments
3
min read
LW
link
Back to top
Next