Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Overcoming the MWC
Mark Freed
Jul 25, 2023, 5:31 PM
3
points
0
comments
3
min read
LW
link
Russian parliamentarian: let’s ban personal computers and the Internet
RomanS
Jul 25, 2023, 5:30 PM
11
points
6
comments
2
min read
LW
link
AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer
Corin Katzke
and
Dan H
Jul 25, 2023, 4:58 PM
6
points
0
comments
6
min read
LW
link
(newsletter.safe.ai)
“The Universe of Minds”—call for reviewers (Seeds of Science)
rogersbacon
Jul 25, 2023, 4:53 PM
7
points
0
comments
1
min read
LW
link
Thoughts on Loss Landscapes and why Deep Learning works
beren
Jul 25, 2023, 4:41 PM
53
points
4
comments
18
min read
LW
link
Should you work at a leading AI lab? (including in non-safety roles)
Benjamin Hilton
Jul 25, 2023, 4:29 PM
7
points
0
comments
12
min read
LW
link
Whisper’s Word-Level Timestamps are Out
Varshul Gupta
Jul 25, 2023, 2:32 PM
−18
points
2
comments
2
min read
LW
link
(dubverseblack.substack.com)
AIS 101: Task decomposition for scalable oversight
Charbel-Raphaël
Jul 25, 2023, 1:34 PM
35
points
0
comments
19
min read
LW
link
(docs.google.com)
Anthropic Observations
Zvi
Jul 25, 2023, 12:50 PM
104
points
1
comment
10
min read
LW
link
(thezvi.wordpress.com)
Autonomous Alignment Oversight Framework (AAOF)
Justausername
Jul 25, 2023, 10:25 AM
−9
points
0
comments
4
min read
LW
link
How LLMs are and are not myopic
janus
Jul 25, 2023, 2:19 AM
135
points
16
comments
8
min read
LW
link
Secure Hand Holding
jefftk
Jul 25, 2023, 1:40 AM
28
points
43
comments
1
min read
LW
link
(www.jefftk.com)
Open problems in activation engineering
TurnTrout
,
woog
,
lisathiergart
,
Monte M
and
Ulisse Mini
Jul 24, 2023, 7:46 PM
51
points
2
comments
1
min read
LW
link
(coda.io)
Subdivisions for Useful Distillations?
Sharat Jacob Jacob
Jul 24, 2023, 6:55 PM
9
points
2
comments
2
min read
LW
link
Optimizing For Approval And Disapproval
Thoth Hermes
Jul 24, 2023, 6:46 PM
−1
points
0
comments
12
min read
LW
link
(thothhermes.substack.com)
An Opinionated Guide to Computability and Complexity (Post #0)
Noosphere89
Jul 24, 2023, 5:53 PM
10
points
10
comments
3
min read
LW
link
Slowing down AI progress is an underexplored alignment strategy
Norman Borlaug
Jul 24, 2023, 4:56 PM
42
points
27
comments
5
min read
LW
link
Anticipation in LLMs
derek shiller
Jul 24, 2023, 3:53 PM
6
points
0
comments
13
min read
LW
link
The cone of freedom (or, freedom might only be instrumentally valuable)
dkl9
Jul 24, 2023, 3:38 PM
−10
points
6
comments
2
min read
LW
link
(dkl9.net)
A reformulation of Finite Factored Sets
Matthias G. Mayer
Jul 24, 2023, 1:02 PM
76
points
1
comment
8
min read
LW
link
Brain Efficiency Cannell Prize Contest Award Ceremony
Alexander Gietelink Oldenziel
Jul 24, 2023, 11:30 AM
149
points
12
comments
7
min read
LW
link
[Crosspost] An AI Pause Is Humanity’s Best Bet For Preventing Extinction (TIME)
otto.barten
Jul 24, 2023, 10:07 AM
12
points
0
comments
7
min read
LW
link
(time.com)
Cryonics and Regret
MvB
Jul 24, 2023, 9:16 AM
192
points
35
comments
2
min read
LW
link
1
review
Rationality !== Winning
Raemon
Jul 24, 2023, 2:53 AM
170
points
51
comments
4
min read
LW
link
[Question]
Which rationality posts are begging for further practical development?
LoganStrohl
Jul 23, 2023, 10:22 PM
60
points
17
comments
1
min read
LW
link
Please speak unpredictably
dkl9
Jul 23, 2023, 10:09 PM
21
points
16
comments
1
min read
LW
link
(dkl9.net)
QAPR 5: grokking is maybe not *that* big a deal?
Quintin Pope
Jul 23, 2023, 8:14 PM
114
points
15
comments
9
min read
LW
link
My favorite AI governance research this year so far
Zach Stein-Perlman
Jul 23, 2023, 4:30 PM
26
points
1
comment
7
min read
LW
link
(blog.aiimpacts.org)
“Justice, Cherryl.”
Zack_M_Davis
Jul 23, 2023, 4:16 PM
91
points
21
comments
9
min read
LW
link
1
review
Supplementary Alignment Insights Through a Highly Controlled Shutdown Incentive
Justausername
Jul 23, 2023, 4:08 PM
4
points
1
comment
3
min read
LW
link
Autogynephilia discourse is so absurdly bad on all sides
tailcalled
Jul 23, 2023, 1:12 PM
44
points
24
comments
2
min read
LW
link
Examples of Prompts that Make GPT-4 Output Falsehoods
scasper
and
Luke Bailey
Jul 22, 2023, 8:21 PM
21
points
5
comments
6
min read
LW
link
Think like a consultant not a salesperson
Adam Zerner
Jul 22, 2023, 7:31 PM
16
points
5
comments
2
min read
LW
link
Optimization, loss set at variance in RL
Clairstan
Jul 22, 2023, 6:25 PM
1
point
1
comment
3
min read
LW
link
Compute Thresholds: proposed rules to mitigate risk of a “lab leak” accident during AI training runs
davidad
Jul 22, 2023, 6:09 PM
80
points
2
comments
2
min read
LW
link
Apollo Neuro Follow Up
Elizabeth
Jul 22, 2023, 5:20 PM
28
points
0
comments
1
min read
LW
link
(acesounderglass.com)
Expert trap – Ways out (Part 3 of 3)
Paweł Sysiak
Jul 22, 2023, 1:06 PM
4
points
0
comments
9
min read
LW
link
GPTs’ ability to keep a secret is weirdly prompt-dependent
Mateusz Bagiński
,
Filip Sondej
and
Marcel Windys
Jul 22, 2023, 12:21 PM
31
points
0
comments
9
min read
LW
link
Replacing the Big Air Purifier
jefftk
Jul 22, 2023, 12:10 PM
10
points
0
comments
1
min read
LW
link
(www.jefftk.com)
[Question]
I’m consistently overwhelmed by basic obligations. Are there any paradigm shifts or other rationality-based tips that would be helpful?
Benjamin Hendricks
Jul 21, 2023, 9:10 PM
71
points
42
comments
2
min read
LW
link
Fundamentally Fuzzy Concepts Can’t Have Crisp Definitions: Cooperation and Alignment vs Math and Physics
VojtaKovarik
Jul 21, 2023, 9:03 PM
12
points
18
comments
3
min read
LW
link
Cooking Air Quality
jefftk
Jul 21, 2023, 7:30 PM
16
points
1
comment
2
min read
LW
link
(www.jefftk.com)
Reward Hacking from a Causal Perspective
tom4everitt
,
Francis Rhys Ward
,
sbenthall
,
James Fox
,
mattmacdermott
and
RyanCarey
Jul 21, 2023, 6:27 PM
29
points
6
comments
7
min read
LW
link
News : Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI
Jonathan Claybrough
Jul 21, 2023, 6:00 PM
65
points
10
comments
2
min read
LW
link
(www.whitehouse.gov)
The UAP Disclosure Act of 2023 and its implications
andeslodes
Jul 21, 2023, 5:21 PM
36
points
47
comments
20
min read
LW
link
(www.congress.gov)
To use computers well, learn their rules
dkl9
Jul 21, 2023, 5:00 PM
4
points
6
comments
4
min read
LW
link
(dkl9.net)
BCIs and the ecosystem of modular minds
beren
Jul 21, 2023, 3:58 PM
88
points
14
comments
11
min read
LW
link
Priorities for the UK Foundation Models Taskforce
Andrea_Miotti
Jul 21, 2023, 3:23 PM
105
points
4
comments
5
min read
LW
link
(www.conjecture.dev)
Training Process Transparency through Gradient Interpretability: Early experiments on toy language models
robertzk
and
evhub
Jul 21, 2023, 2:52 PM
56
points
1
comment
1
min read
LW
link
[Question]
Can AI Alignment please create a Reddit-like platform that would make it much easier for alignment researchers to find and help each other?
Georgeo57
Jul 21, 2023, 2:03 PM
−5
points
2
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel