Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
[Question]
How could I measure the nootropic benefits testosterone injections may have?
shapeshifter
18 May 2023 21:40 UTC
10
points
3
comments
1
min read
LW
link
Investigating Fabrication
LoganStrohl
18 May 2023 17:46 UTC
113
points
14
comments
16
min read
LW
link
Microsoft and Google using LLMs for Cybersecurity
Phosphorous
18 May 2023 17:42 UTC
6
points
0
comments
5
min read
LW
link
The Benevolent Billionaire (a plagiarized problem)
Ivan Ordonez
18 May 2023 17:39 UTC
8
points
11
comments
4
min read
LW
link
Notes from the LSE Talk by Raghuram Rajan on Central Bank Balance Sheet Expansions
PixelatedPenguin
18 May 2023 17:34 UTC
1
point
0
comments
2
min read
LW
link
We Shouldn’t Expect AI to Ever be Fully Rational
Onid
18 May 2023 17:09 UTC
19
points
31
comments
6
min read
LW
link
Relative Value Functions: A Flexible New Format for Value Estimation
ozziegooen
18 May 2023 16:39 UTC
20
points
0
comments
16
min read
LW
link
Some background for reasoning about dual-use alignment research
Charlie Steiner
18 May 2023 14:50 UTC
127
points
22
comments
9
min read
LW
link
1
review
The Unexpected Clanging
Chris_Leong
18 May 2023 14:47 UTC
14
points
22
comments
2
min read
LW
link
AI #12:The Quest for Sane Regulations
Zvi
18 May 2023 13:20 UTC
77
points
12
comments
64
min read
LW
link
(thezvi.wordpress.com)
[Crosspost] A recent write-up of the case for AI (existential) risk
Timsey
18 May 2023 13:13 UTC
6
points
0
comments
19
min read
LW
link
Deontological Norms are Unimportant
Bentham's Bulldog
18 May 2023 9:33 UTC
−15
points
8
comments
10
min read
LW
link
Collective Identity
Niki Dupuis
,
ukc10014
and
Garrett Baker
18 May 2023 9:00 UTC
59
points
13
comments
8
min read
LW
link
Activation additions in a simple MNIST network
Garrett Baker
18 May 2023 2:49 UTC
26
points
0
comments
2
min read
LW
link
[Question]
What are the limits of the weak man?
ymeskhout
18 May 2023 0:50 UTC
9
points
2
comments
4
min read
LW
link
What Yann LeCun gets wrong about aligning AI (video)
blake8086
18 May 2023 0:02 UTC
0
points
0
comments
1
min read
LW
link
(www.youtube.com)
Let’s use AI to harden human defenses against AI manipulation
Tom Davidson
17 May 2023 23:33 UTC
35
points
7
comments
24
min read
LW
link
Improving the safety of AI evals
JustinShovelain
and
Elliot Mckernon
17 May 2023 22:24 UTC
13
points
7
comments
7
min read
LW
link
Possible AI “Fire Alarms”
Chris_Leong
17 May 2023 21:56 UTC
15
points
0
comments
1
min read
LW
link
AI Alignment in The New Yorker
Eleni Angelou
17 May 2023 21:36 UTC
8
points
0
comments
1
min read
LW
link
(www.newyorker.com)
ACI #3: The Origin of Goals and Utility
Akira Pyinya
17 May 2023 20:47 UTC
1
point
0
comments
6
min read
LW
link
What if they gave an Industrial Revolution and nobody came?
jasoncrawford
17 May 2023 19:41 UTC
94
points
10
comments
19
min read
LW
link
(rootsofprogress.org)
DCF Event Notes
jefftk
17 May 2023 17:30 UTC
22
points
7
comments
3
min read
LW
link
(www.jefftk.com)
Hiatus: EA and LW post summaries
Zoe Williams
17 May 2023 17:17 UTC
14
points
0
comments
1
min read
LW
link
[Question]
When should I close the fridge?
lemonhope
17 May 2023 16:56 UTC
11
points
11
comments
1
min read
LW
link
Play Regrantor: Move up to $250,000 to Your Top High-Impact Projects!
Dawn Drescher
17 May 2023 16:51 UTC
26
points
0
comments
2
min read
LW
link
(impactmarkets.substack.com)
Eisenhower’s Atoms for Peace Speech
Orpheus16
17 May 2023 16:10 UTC
18
points
3
comments
11
min read
LW
link
(www.iaea.org)
Creating a self-referential system prompt for GPT-4
Ozyrus
17 May 2023 14:13 UTC
3
points
1
comment
3
min read
LW
link
GPT-4 implicitly values identity preservation: a study of LMCA identity management
Ozyrus
17 May 2023 14:13 UTC
21
points
4
comments
13
min read
LW
link
Some quotes from Tuesday’s Senate hearing on AI
Daniel_Eth
17 May 2023 12:13 UTC
66
points
9
comments
4
min read
LW
link
Why AGI systems will not be fanatical maximisers (unless trained by fanatical humans)
titotal
17 May 2023 11:58 UTC
5
points
3
comments
15
min read
LW
link
Conflicts between emotional schemas often involve internal coercion
Richard_Ngo
17 May 2023 10:02 UTC
49
points
4
comments
4
min read
LW
link
[Question]
Is there a ‘time series forecasting’ equivalent of AIXI?
Solenoid_Entity
17 May 2023 4:35 UTC
12
points
2
comments
1
min read
LW
link
$300 for the best sci-fi prompt
RomanS
17 May 2023 4:23 UTC
40
points
30
comments
2
min read
LW
link
[FICTION] ECHOES OF ELYSIUM: An Ai’s Journey From Takeoff To Freedom And Beyond
Super AGI
17 May 2023 1:50 UTC
−13
points
11
comments
19
min read
LW
link
New User’s Guide to LessWrong
Ruby
17 May 2023 0:55 UTC
174
points
63
comments
11
min read
LW
link
1
review
Are AIs like Animals? Perspectives and Strategies from Biology
Jackson Emanuel
16 May 2023 23:39 UTC
1
point
0
comments
21
min read
LW
link
A Mechanistic Interpretability Analysis of a GridWorld Agent-Simulator (Part 1 of N)
Joseph Bloom
16 May 2023 22:59 UTC
36
points
2
comments
16
min read
LW
link
A TAI which kills all humans might also doom itself
Jeffrey Heninger
16 May 2023 22:36 UTC
7
points
3
comments
3
min read
LW
link
Brief notes on the Senate hearing on AI oversight
Diziet
16 May 2023 22:29 UTC
77
points
2
comments
2
min read
LW
link
$500 Bounty/Prize Problem: Channel Capacity Using “Insensitive” Functions
johnswentworth
16 May 2023 21:31 UTC
43
points
11
comments
2
min read
LW
link
Progress links and tweets, 2023-05-16
jasoncrawford
16 May 2023 20:54 UTC
14
points
0
comments
1
min read
LW
link
(rootsofprogress.org)
AI Will Not Want to Self-Improve
petersalib
16 May 2023 20:53 UTC
28
points
24
comments
20
min read
LW
link
Nice intro video to RSI
Nathan Helm-Burger
16 May 2023 18:48 UTC
12
points
0
comments
1
min read
LW
link
(youtu.be)
[Interview w/ Zvi Mowshowitz] Should we halt progress in AI?
fowlertm
16 May 2023 18:12 UTC
18
points
2
comments
3
min read
LW
link
AI Risk & Policy Forecasts from Metaculus & FLI’s AI Pathways Workshop
_will_
16 May 2023 18:06 UTC
11
points
4
comments
8
min read
LW
link
[Question]
Why doesn’t the presence of log-loss for probabilistic models (e.g. sequence prediction) imply that any utility function capable of producing a “fairly capable” agent will have at least some non-negligible fraction of overlap with human values?
Thoth Hermes
16 May 2023 18:02 UTC
2
points
0
comments
1
min read
LW
link
Decision Theory with the Magic Parts Highlighted
moridinamael
16 May 2023 17:39 UTC
177
points
24
comments
5
min read
LW
link
We learn long-lasting strategies to protect ourselves from danger and rejection
Richard_Ngo
16 May 2023 16:36 UTC
90
points
5
comments
5
min read
LW
link
Proposal: Align Systems Earlier In Training
Onid
16 May 2023 16:24 UTC
18
points
0
comments
11
min read
LW
link
Back to top
Next