Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Reliable Sources: The Story of David Gerard
TracingWoodgrains
Jul 10, 2024, 7:50 PM
390
points
54
comments
43
min read
LW
link
Managing Emotional Potential Energy
adamShimi
Jul 10, 2024, 6:20 PM
24
points
4
comments
4
min read
LW
link
(epistemologicalfascinations.substack.com)
[EAForum xpost] A breakdown of OpenAI’s revenue
dschwarz
and
Lawrence Phillips
Jul 10, 2024, 6:09 PM
57
points
5
comments
1
min read
LW
link
(forum.effectivealtruism.org)
Solving Pascal’s Wager using dynamic programming
Paul Wilczewski
Jul 10, 2024, 6:09 PM
1
point
0
comments
5
min read
LW
link
Fluent, Cruxy Predictions
Raemon
Jul 10, 2024, 6:00 PM
86
points
14
comments
14
min read
LW
link
Antitrust as Controlled Creative Destruction
Martin Sustrik
Jul 10, 2024, 4:40 PM
14
points
2
comments
2
min read
LW
link
(250bpm.substack.com)
New page: Integrity
Zach Stein-Perlman
Jul 10, 2024, 3:00 PM
91
points
3
comments
1
min read
LW
link
AirBnB Baking
jefftk
Jul 10, 2024, 12:50 PM
7
points
1
comment
1
min read
LW
link
(www.jefftk.com)
DIY RLHF: A simple implementation for hands on experience
Mike Vaiana
and
AE Studio
Jul 10, 2024, 12:07 PM
28
points
0
comments
6
min read
LW
link
Usefulness grounds truth
invertedpassion
Jul 10, 2024, 7:58 AM
0
points
0
comments
4
min read
LW
link
On passing Complete and Honest Ideological Turing Tests (CHITTs)
Aryeh Englander
Jul 10, 2024, 4:01 AM
11
points
2
comments
1
min read
LW
link
[Question]
Pondering how good or bad things will be in the AGI future
Sherrinford
Jul 9, 2024, 10:46 PM
11
points
9
comments
2
min read
LW
link
Causal Graphs of GPT-2-Small’s Residual Stream
David Udell
Jul 9, 2024, 10:06 PM
53
points
7
comments
7
min read
LW
link
[Question]
If AI starts to end the world, is suicide a good idea?
IlluminateReality
Jul 9, 2024, 9:53 PM
0
points
8
comments
1
min read
LW
link
Rationalist Purity Test
Gunnar_Zarncke
Jul 9, 2024, 8:30 PM
−9
points
5
comments
1
min read
LW
link
(ratpuritytest.com)
That which can be destroyed by the truth, should be assumed to should be destroyed by it
Thac0
Jul 9, 2024, 7:39 PM
6
points
0
comments
3
min read
LW
link
AISN #38: Supreme Court Decision Could Limit Federal Ability to Regulate AI Plus, “Circuit Breakers” for AI systems, and updates on China’s AI industry
Corin Katzke
,
Alexa Pan
,
Julius
and
Dan H
Jul 9, 2024, 7:28 PM
5
points
0
comments
5
min read
LW
link
(newsletter.safe.ai)
Summer Tour Stops
jefftk
Jul 9, 2024, 7:10 PM
10
points
0
comments
3
min read
LW
link
(www.jefftk.com)
Fix simple mistakes in ARC-AGI, etc.
Oleg Trott
Jul 9, 2024, 5:46 PM
9
points
9
comments
1
min read
LW
link
Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers
Jeffrey Heninger
Jul 9, 2024, 4:50 PM
42
points
2
comments
2
min read
LW
link
(blog.aiimpacts.org)
UC Berkeley course on LLMs and ML Safety
Dan H
Jul 9, 2024, 3:40 PM
36
points
1
comment
1
min read
LW
link
(rdi.berkeley.edu)
What and Why: Developmental Interpretability of Reinforcement Learning
Garrett Baker
Jul 9, 2024, 2:09 PM
68
points
4
comments
6
min read
LW
link
Medical Roundup #3
Zvi
Jul 9, 2024, 1:10 PM
39
points
4
comments
19
min read
LW
link
(thezvi.wordpress.com)
Consent across power differentials
Ramana Kumar
Jul 9, 2024, 11:42 AM
50
points
12
comments
3
min read
LW
link
[Question]
How bad would AI progress need to be for us to think general technological progress is also bad?
Jim Buhler
Jul 9, 2024, 10:43 AM
9
points
5
comments
1
min read
LW
link
How LLMs Learn: What We Know, What We Don’t (Yet) Know, and What Comes Next
Jonasb
Jul 9, 2024, 9:58 AM
2
points
0
comments
16
min read
LW
link
(www.denominations.io)
WTF is with the Infancy Gospel of Thomas?!? A deep dive into satire, philosophy, and more
kromem
Jul 9, 2024, 9:29 AM
18
points
2
comments
11
min read
LW
link
Book Review: Safe Enough? A History of Nuclear Power and Accident Risk
ErickBall
Jul 9, 2024, 1:12 AM
10
points
0
comments
28
min read
LW
link
Me, Myself, and AI: the Situational Awareness Dataset (SAD) for LLMs
L Rudolf L
,
bilalchughtai
,
Jan Betley
,
kaivu
,
Jérémy Scheurer
,
Mikita Balesni
,
AlexMeinke
,
Owain_Evans
and
Marius Hobbhahn
Jul 8, 2024, 10:24 PM
109
points
37
comments
5
min read
LW
link
Robin Hanson & Liron Shapira Debate AI X-Risk
Liron
Jul 8, 2024, 9:45 PM
34
points
4
comments
1
min read
LW
link
(www.youtube.com)
“The Singularity Is Nearer” by Ray Kurzweil—Review
Lavender
Jul 8, 2024, 9:32 PM
22
points
0
comments
4
min read
LW
link
Sample Prevalence vs Global Prevalence
jefftk
Jul 8, 2024, 9:00 PM
11
points
0
comments
2
min read
LW
link
(www.jefftk.com)
Advice to junior AI governance researchers
Orpheus16
Jul 8, 2024, 7:19 PM
66
points
1
comment
5
min read
LW
link
Pantheon Interface
NicholasKees
and
Sofia Vanhanen
Jul 8, 2024, 7:03 PM
127
points
22
comments
6
min read
LW
link
Launching the AI Forecasting Benchmark Series Q3 | $30k in Prizes
ChristianWilliams
Jul 8, 2024, 5:20 PM
5
points
0
comments
LW
link
(www.metaculus.com)
The Golden Mean of Scientific Virtues
adamShimi
Jul 8, 2024, 5:16 PM
12
points
4
comments
8
min read
LW
link
(epistemologicalfascinations.substack.com)
Massapequa (Long Island), New York, USA – ACX Meetup
Gabriel Weil
Jul 8, 2024, 5:01 PM
2
points
0
comments
1
min read
LW
link
Dialogue introduction to Singular Learning Theory
Olli Järviniemi
Jul 8, 2024, 4:58 PM
101
points
15
comments
8
min read
LW
link
Announcing The Techno-Humanist Manifesto: A new philosophy of progress for the 21st century
jasoncrawford
Jul 8, 2024, 4:33 PM
18
points
4
comments
5
min read
LW
link
(blog.rootsofprogress.org)
Response to Dileep George: AGI safety warrants planning ahead
Steven Byrnes
Jul 8, 2024, 3:27 PM
27
points
7
comments
27
min read
LW
link
Why not parliamentarianism? [book by Tiago Ribeiro dos Santos]
Arturo Macias
Jul 8, 2024, 2:57 PM
2
points
1
comment
4
min read
LW
link
Games of My Childhood: The Troops
Kaj_Sotala
Jul 8, 2024, 11:20 AM
18
points
0
comments
5
min read
LW
link
(kajsotala.fi)
Towards shutdownable agents via stochastic choice
EJT
,
alexr
,
christosi
and
LAThomson
Jul 8, 2024, 10:14 AM
59
points
11
comments
23
min read
LW
link
(arxiv.org)
On scalable oversight with weak LLMs judging strong LLMs
zac_kenton
,
Noah Siegel
,
janos
,
Jonah Brown-Cohen
,
Samuel Albanie
,
David Lindner
and
Rohin Shah
Jul 8, 2024, 8:59 AM
49
points
18
comments
7
min read
LW
link
(arxiv.org)
Poker is a bad game for teaching epistemics. Figgie is a better one.
rossry
Jul 8, 2024, 6:05 AM
105
points
47
comments
11
min read
LW
link
(blog.rossry.net)
Controlled Creative Destruction
Martin Sustrik
Jul 8, 2024, 4:36 AM
11
points
0
comments
2
min read
LW
link
On saying “Thank you” instead of “I’m Sorry”
Michael Cohn
Jul 8, 2024, 3:13 AM
136
points
16
comments
3
min read
LW
link
How can I get over my fear of becoming an emulated consciousness?
James Dowdell
7 Jul 2024 22:02 UTC
6
points
8
comments
5
min read
LW
link
An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2
Neel Nanda
7 Jul 2024 17:39 UTC
136
points
16
comments
25
min read
LW
link
Joint mandatory donation as a way to increase the number of donations
Crazy philosopher
7 Jul 2024 10:56 UTC
3
points
3
comments
2
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel