Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
OpenAI: GPT-based LLMs show ability to discriminate between its own wrong answers, but inability to explain how/why it makes that discrimination, even as model scales
Aditya Jain
13 Jun 2022 23:33 UTC
14
points
5
comments
1
min read
LW
link
(openai.com)
[Question]
Who said something like “The fact that putting 2 apples next to 2 other apples leads to there being 4 apples there has nothing to do with the fact that 2 + 2 = 4”?
hunterglenn
13 Jun 2022 22:23 UTC
1
point
2
comments
1
min read
LW
link
Continuity Assumptions
Jan_Kulveit
13 Jun 2022 21:31 UTC
35
points
13
comments
4
min read
LW
link
Crypto-fed Computation
aaguirre
13 Jun 2022 21:20 UTC
23
points
7
comments
7
min read
LW
link
A Modest Pivotal Act
anonymousaisafety
13 Jun 2022 19:24 UTC
−16
points
1
comment
5
min read
LW
link
Contra EY: Can AGI destroy us without trial & error?
Nikita Sokolsky
13 Jun 2022 18:26 UTC
136
points
72
comments
15
min read
LW
link
What are some smaller-but-concrete challenges related to AI safety that are impacting people today?
nonzerosum
13 Jun 2022 17:36 UTC
4
points
3
comments
1
min read
LW
link
[Link] New SEP article on Bayesian Epistemology
Aryeh Englander
13 Jun 2022 15:03 UTC
6
points
0
comments
1
min read
LW
link
Training Trace Priors
Adam Jermyn
13 Jun 2022 14:22 UTC
12
points
17
comments
4
min read
LW
link
[Question]
Can you MRI a deep learning model?
Yair Halberstadt
13 Jun 2022 13:43 UTC
3
points
3
comments
1
min read
LW
link
On A List of Lethalities
Zvi
13 Jun 2022 12:30 UTC
161
points
49
comments
54
min read
LW
link
1
review
(thezvi.wordpress.com)
D&D.Sci June 2022 Evaluation and Ruleset
abstractapplic
13 Jun 2022 10:31 UTC
30
points
10
comments
4
min read
LW
link
[Question]
What’s the “This AI is of moral concern.” fire alarm?
Quintin Pope
13 Jun 2022 8:05 UTC
37
points
56
comments
2
min read
LW
link
The beautiful magical enchanted golden Dall-e Mini is underrated
p.b.
13 Jun 2022 7:58 UTC
14
points
0
comments
1
min read
LW
link
Why so little AI risk on rationalist-adjacent blogs?
Grant Demaree
13 Jun 2022 6:31 UTC
46
points
23
comments
8
min read
LW
link
Code Quality and Rule Consequentialism
Adam Zerner
13 Jun 2022 3:12 UTC
17
points
13
comments
6
min read
LW
link
Grokking “Semi-informative priors over AI timelines”
anson.ho
12 Jun 2022 22:17 UTC
15
points
7
comments
14
min read
LW
link
[Question]
How much does cybersecurity reduce AI risk?
Darmani
12 Jun 2022 22:13 UTC
34
points
23
comments
1
min read
LW
link
[Question]
How are compute assets distributed in the world?
Chris van Merwijk
12 Jun 2022 22:13 UTC
30
points
7
comments
1
min read
LW
link
Intuitive Explanation of AIXI
Thomas Larsen
12 Jun 2022 21:41 UTC
21
points
0
comments
5
min read
LW
link
Why all the fuss about recursive self-improvement?
So8res
12 Jun 2022 20:53 UTC
158
points
62
comments
7
min read
LW
link
1
review
Why the Kaldor-Hicks criterion can be non-transitive
Rupert
12 Jun 2022 17:26 UTC
4
points
10
comments
2
min read
LW
link
[Question]
How do you post links here?
skybrian
12 Jun 2022 16:23 UTC
1
point
1
comment
1
min read
LW
link
Pure Altruism
UtilityMonster
12 Jun 2022 15:53 UTC
2
points
4
comments
4
min read
LW
link
[Question]
Filter out tags from the front page?
jaspax
12 Jun 2022 10:59 UTC
9
points
2
comments
1
min read
LW
link
How To: A Workshop (or anything)
[DEACTIVATED] Duncan Sabien
12 Jun 2022 8:00 UTC
53
points
13
comments
37
min read
LW
link
1
review
A claim that Google’s LaMDA is sentient
Ben Livengood
12 Jun 2022 4:18 UTC
31
points
133
comments
1
min read
LW
link
[Question]
How much stupider than humans can AI be and still kill us all through sheer numbers and resource access?
shminux
12 Jun 2022 1:01 UTC
11
points
11
comments
1
min read
LW
link
ELK Proposal—Make the Reporter care about the Predictor’s beliefs
Adam Jermyn
and
Nicholas Schiefer
11 Jun 2022 22:53 UTC
8
points
0
comments
6
min read
LW
link
[Question]
Why has no person / group ever taken over the world?
Aryeh Englander
11 Jun 2022 20:51 UTC
25
points
19
comments
1
min read
LW
link
[Question]
Are there English-speaking meetups in Frankfurt/Munich/Zurich?
Grant Demaree
11 Jun 2022 20:02 UTC
6
points
2
comments
1
min read
LW
link
Beauty and the Beast
Tomás B.
11 Jun 2022 18:59 UTC
32
points
8
comments
6
min read
LW
link
Poorly-Aimed Death Rays
Thane Ruthenis
11 Jun 2022 18:29 UTC
48
points
5
comments
4
min read
LW
link
AGI Safety Communications Initiative
ines
11 Jun 2022 17:34 UTC
7
points
0
comments
1
min read
LW
link
A gaming group for rationality-aware people
dhatas
11 Jun 2022 16:04 UTC
7
points
0
comments
1
min read
LW
link
[Question]
Why don’t you introduce really impressive people you personally know to AI alignment (more often)?
Verden
11 Jun 2022 15:59 UTC
33
points
14
comments
1
min read
LW
link
Godzilla Strategies
johnswentworth
11 Jun 2022 15:44 UTC
146
points
71
comments
3
min read
LW
link
Steganography and the CycleGAN—alignment failure case study
Jan Czechowski
11 Jun 2022 9:41 UTC
33
points
0
comments
4
min read
LW
link
The Mountain Troll
lsusr
11 Jun 2022 9:14 UTC
95
points
25
comments
2
min read
LW
link
Show LW: YodaTimer.com
Adam Zerner
11 Jun 2022 8:52 UTC
27
points
4
comments
1
min read
LW
link
How fast can we perform a forward pass?
jsteinhardt
10 Jun 2022 23:30 UTC
53
points
9
comments
15
min read
LW
link
(bounded-regret.ghost.io)
Summary of “AGI Ruin: A List of Lethalities”
Stephen McAleese
10 Jun 2022 22:35 UTC
44
points
2
comments
8
min read
LW
link
How dangerous is human-level AI?
Alex_Altair
10 Jun 2022 17:38 UTC
21
points
4
comments
8
min read
LW
link
Another plausible scenario of AI risk: AI builds military infrastructure while collaborating with humans, defects later.
avturchin
10 Jun 2022 17:24 UTC
10
points
2
comments
1
min read
LW
link
Leaving Google, Joining the Nucleic Acid Observatory
jefftk
10 Jun 2022 17:00 UTC
114
points
4
comments
3
min read
LW
link
(www.jefftk.com)
On The Spectrum, On The Guest List: (v) The Fleur Room
party girl
10 Jun 2022 14:50 UTC
8
points
1
comment
14
min read
LW
link
(onthespectrumontheguestlist.substack.com)
Progress Report 6: get the tool working
Nathan Helm-Burger
10 Jun 2022 11:18 UTC
4
points
0
comments
2
min read
LW
link
[Question]
Is AI Alignment Impossible?
Heighn
10 Jun 2022 10:08 UTC
3
points
3
comments
1
min read
LW
link
I No Longer Believe Intelligence to be “Magical”
DragonGod
10 Jun 2022 8:58 UTC
27
points
34
comments
6
min read
LW
link
[linkpost] The final AI benchmark: BIG-bench
RomanS
10 Jun 2022 8:53 UTC
25
points
21
comments
1
min read
LW
link
Back to top
Next