Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Hide coronavirus posts
RSS
New
Hot
Active
Old
Page
1
The longest training run
Jsevillamol
,
Tamay
and
Owen Dudney
17 Aug 2022 17:18 UTC
41
points
6
comments
9
min read
LW
link
(epochai.org)
Interpretability Tools Are an Attack Channel
Thane Ruthenis
17 Aug 2022 18:47 UTC
24
points
7
comments
1
min read
LW
link
The Core of the Alignment Problem is...
Thomas Larsen
,
Jeremy Gillen
and
AtlasOfCharts
17 Aug 2022 20:07 UTC
25
points
3
comments
9
min read
LW
link
My thoughts on direct work (and joining LessWrong)
RobertM
16 Aug 2022 18:53 UTC
51
points
4
comments
6
min read
LW
link
Do meta-memes and meta-antimemes exist? e.g. ‘The map is not the territory’ is also a map
M. Y. Zuo
7 Aug 2022 1:17 UTC
4
points
23
comments
1
min read
LW
link
Human Mimicry Mainly Works When We’re Already Close
johnswentworth
17 Aug 2022 18:41 UTC
41
points
5
comments
5
min read
LW
link
Mesa-optimization for goals defined only within a training environment is dangerous
Rubi
17 Aug 2022 3:56 UTC
6
points
2
comments
4
min read
LW
link
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
and
Tom Lieberum
15 Aug 2022 2:41 UTC
239
points
15
comments
41
min read
LW
link
(colab.research.google.com)
Against population ethics
jasoncrawford
16 Aug 2022 5:19 UTC
29
points
30
comments
3
min read
LW
link
Half-baked AI Safety ideas thread
Aryeh Englander
23 Jun 2022 16:11 UTC
58
points
59
comments
1
min read
LW
link
Matt Yglesias on AI Policy
Grant Demaree
17 Aug 2022 23:57 UTC
19
points
0
comments
1
min read
LW
link
(www.slowboring.com)
Why are politicians polarized?
ErnestScribbler
21 Jul 2022 8:17 UTC
13
points
24
comments
7
min read
LW
link
Conditioning, Prompts, and Fine-Tuning
Adam Jermyn
17 Aug 2022 20:52 UTC
22
points
1
comment
4
min read
LW
link
The Parable of the Boy Who Cried 5% Chance of Wolf
KatWoods
15 Aug 2022 14:33 UTC
125
points
19
comments
2
min read
LW
link
Understanding differences between humans and intelligence-in-general to build safe AGI
Florian_Dietz
16 Aug 2022 8:27 UTC
7
points
5
comments
1
min read
LW
link
Insufficient awareness of how everything sucks
Flaglandbase
17 Aug 2022 8:01 UTC
−5
points
3
comments
1
min read
LW
link
On the falsifiability of hypercomputation, part 2: finite input streams
jessicata
17 Feb 2020 3:51 UTC
25
points
6
comments
4
min read
LW
link
(unstableontology.com)
Concrete Advice for Forming Inside Views on AI Safety
Neel Nanda
17 Aug 2022 22:02 UTC
12
points
0
comments
10
min read
LW
link
Reward is not the optimization target
TurnTrout
25 Jul 2022 0:03 UTC
169
points
79
comments
12
min read
LW
link
Progress links and tweets, 2022-08-17
jasoncrawford
17 Aug 2022 21:27 UTC
11
points
0
comments
2
min read
LW
link
(rootsofprogress.org)
Back to top
Next