Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Stop-gradients lead to fixed point predictions
Johannes Treutlein
,
Caspar Oesterheld
,
Rubi J. Hudson
and
Emery Cooper
Jan 28, 2023, 10:47 PM
37
points
2
comments
24
min read
LW
link
Eli Dourado AMA on the Progress Forum
jasoncrawford
Jan 28, 2023, 10:18 PM
19
points
0
comments
1
min read
LW
link
(rootsofprogress.org)
LW Filter Tags (Rationality/World Modeling now promoted in Latest Posts)
Ruby
and
RobertM
Jan 28, 2023, 10:14 PM
60
points
4
comments
3
min read
LW
link
No Fire in the Equations
Carlos Ramirez
Jan 28, 2023, 9:16 PM
−16
points
4
comments
3
min read
LW
link
Optimality is the tiger, and annoying the user is its teeth
Christopher King
Jan 28, 2023, 8:20 PM
25
points
6
comments
2
min read
LW
link
On not getting contaminated by the wrong obesity ideas
Natália
Jan 28, 2023, 8:18 PM
306
points
69
comments
30
min read
LW
link
Advice I found helpful in 2022
Orpheus16
Jan 28, 2023, 7:48 PM
36
points
5
comments
2
min read
LW
link
The Knockdown Argument Paradox
Bryan Frances
Jan 28, 2023, 7:23 PM
−12
points
6
comments
8
min read
LW
link
Less Wrong/ACX Budapest Feb 4th Meetup
Richard Horvath
and
Timothy Underwood
Jan 28, 2023, 2:49 PM
2
points
0
comments
1
min read
LW
link
Reflections on Deception & Generality in Scalable Oversight (Another OpenAI Alignment Review)
Shoshannah Tekofsky
Jan 28, 2023, 5:26 AM
53
points
7
comments
7
min read
LW
link
A Simple Alignment Typology
Shoshannah Tekofsky
Jan 28, 2023, 5:26 AM
34
points
2
comments
2
min read
LW
link
Spooky action at a distance in the loss landscape
Jesse Hoogland
and
Filip Sondej
Jan 28, 2023, 12:22 AM
61
points
4
comments
7
min read
LW
link
(www.jessehoogland.com)
WaPo: “Big Tech was moving cautiously on AI. Then came ChatGPT.”
Julian Bradshaw
Jan 27, 2023, 10:54 PM
26
points
5
comments
1
min read
LW
link
(www.washingtonpost.com)
Literature review of TAI timelines
Jsevillamol
,
keith_wynroe
and
David Atkinson
Jan 27, 2023, 8:07 PM
35
points
7
comments
2
min read
LW
link
(epochai.org)
Scaling Laws Literature Review
Pablo Villalobos
Jan 27, 2023, 7:57 PM
36
points
1
comment
4
min read
LW
link
(epochai.org)
The role of Bayesian ML in AI safety—an overview
Marius Hobbhahn
Jan 27, 2023, 7:40 PM
31
points
6
comments
10
min read
LW
link
Assigning Praise and Blame: Decoupling Epistemology and Decision Theory
adamShimi
and
Gabriel Alfour
Jan 27, 2023, 6:16 PM
59
points
5
comments
3
min read
LW
link
[Question]
How could humans dominate over a super intelligent AI?
Marco Discendenti
Jan 27, 2023, 6:15 PM
−5
points
8
comments
1
min read
LW
link
ChatGPT understands language
philosophybear
Jan 27, 2023, 7:14 AM
27
points
4
comments
6
min read
LW
link
(philosophybear.substack.com)
Jar of Chocolate
jefftk
Jan 27, 2023, 3:40 AM
10
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Basics of Rationalist Discourse
Duncan Sabien (Inactive)
Jan 27, 2023, 2:40 AM
284
points
193
comments
36
min read
LW
link
4
reviews
The recent banality of rationality (and effective altruism)
CraigMichael
Jan 27, 2023, 1:19 AM
−6
points
7
comments
11
min read
LW
link
11 heuristics for choosing (alignment) research projects
Orpheus16
and
danesherbs
Jan 27, 2023, 12:36 AM
50
points
5
comments
1
min read
LW
link
A different observation of Vavilov Day
Elizabeth
Jan 26, 2023, 9:50 PM
30
points
1
comment
1
min read
LW
link
(acesounderglass.com)
All AGI Safety questions welcome (especially basic ones) [~monthly thread]
mwatkins
and
Robert Miles
Jan 26, 2023, 9:01 PM
39
points
81
comments
2
min read
LW
link
Just another thought experiment
Bohdan Kudlai
Jan 26, 2023, 7:29 PM
−11
points
0
comments
1
min read
LW
link
Exquisite Oracle: A Dadaist-Inspired Literary Game for Many Friends (or 1 AI)
Yitz
Jan 26, 2023, 6:26 PM
6
points
1
comment
1
min read
LW
link
AI Risk Management Framework | NIST
DragonGod
Jan 26, 2023, 3:27 PM
36
points
4
comments
2
min read
LW
link
(www.nist.gov)
“How to Escape from the Simulation”—Seeds of Science call for reviewers
rogersbacon
Jan 26, 2023, 3:11 PM
12
points
0
comments
1
min read
LW
link
Loom: Why and How to use it
brook
Jan 26, 2023, 2:34 PM
2
points
5
comments
LW
link
Covid 1/26/23: Case Count Crash
Zvi
Jan 26, 2023, 12:50 PM
32
points
5
comments
9
min read
LW
link
(thezvi.wordpress.com)
[Question]
How are you currently modeling COVID contagiousness?
CounterBlunder
Jan 26, 2023, 4:46 AM
2
points
2
comments
1
min read
LW
link
[Question]
What’s the simplest concrete unsolved problem in AI alignment?
agg
Jan 26, 2023, 4:15 AM
28
points
4
comments
1
min read
LW
link
2022 Less Wrong Census/Survey: Request for Comments
Screwtape
Jan 25, 2023, 8:57 PM
5
points
29
comments
1
min read
LW
link
Next steps after AGISF at UMich
JakubK
Jan 25, 2023, 8:57 PM
10
points
0
comments
5
min read
LW
link
(docs.google.com)
AGI will have learnt utility functions
beren
Jan 25, 2023, 7:42 PM
38
points
4
comments
13
min read
LW
link
[RFC] Possible ways to expand on “Discovering Latent Knowledge in Language Models Without Supervision”.
gekaklam
,
Walter Laurito
,
Kaarel
and
Kay Kozaronek
Jan 25, 2023, 7:03 PM
48
points
6
comments
12
min read
LW
link
Spreading messages to help with the most important century
HoldenKarnofsky
Jan 25, 2023, 6:20 PM
75
points
4
comments
18
min read
LW
link
(www.cold-takes.com)
My Model Of EA Burnout
LoganStrohl
Jan 25, 2023, 5:52 PM
259
points
50
comments
5
min read
LW
link
1
review
Thoughts on the impact of RLHF research
paulfchristiano
Jan 25, 2023, 5:23 PM
253
points
102
comments
9
min read
LW
link
[Question]
Could AI be used to engineer a sociopolitical situation where humans can solve the problems surrounding AGI?
hollowing
Jan 25, 2023, 5:17 PM
1
point
6
comments
1
min read
LW
link
Progress links and tweets, 2023-01-25
jasoncrawford
25 Jan 2023 16:12 UTC
8
points
0
comments
1
min read
LW
link
(rootsofprogress.org)
Visualisation of Probability Mass
brook
25 Jan 2023 15:09 UTC
7
points
0
comments
LW
link
When Did EA Start?
jefftk
25 Jan 2023 14:30 UTC
37
points
2
comments
2
min read
LW
link
(www.jefftk.com)
Some Thoughts on AI Art
abramdemski
25 Jan 2023 14:18 UTC
74
points
20
comments
7
min read
LW
link
Quick thoughts on “scalable oversight” / “super-human feedback” research
David Scott Krueger (formerly: capybaralet)
25 Jan 2023 12:55 UTC
27
points
9
comments
2
min read
LW
link
Sapir-Whorf for Rationalists
Duncan Sabien (Inactive)
25 Jan 2023 7:58 UTC
155
points
49
comments
19
min read
LW
link
ChatGPT vs the 2-4-6 Task
cwillu
25 Jan 2023 6:59 UTC
20
points
4
comments
3
min read
LW
link
Pessimistic Shard Theory
Garrett Baker
25 Jan 2023 0:59 UTC
72
points
13
comments
3
min read
LW
link
Thatcher’s Axiom
Edward P. Könings
24 Jan 2023 22:35 UTC
10
points
22
comments
4
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel