Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Monosemanticity & Quantization
Rahul Chand
Oct 22, 2024, 10:57 PM
1
point
0
comments
9
min read
LW
link
[Question]
What is the alpha in one bit of evidence?
J Bostock
Oct 22, 2024, 9:57 PM
20
points
13
comments
1
min read
LW
link
Catastrophic sabotage as a major threat model for human-level AI systems
evhub
Oct 22, 2024, 8:57 PM
92
points
13
comments
15
min read
LW
link
Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)
Elizabeth
Oct 22, 2024, 6:20 PM
76
points
82
comments
1
min read
LW
link
(acesounderglass.com)
Decision-Making Under Uncertainty: Lessons From AI
Jonasb
Oct 22, 2024, 5:54 PM
−1
points
0
comments
5
min read
LW
link
(www.denominations.io)
Testing Genetic Engineering Detection with Spike-Ins
jefftk
Oct 22, 2024, 5:20 PM
9
points
0
comments
LW
link
(naobservatory.org)
Predictions as Public Works Project — What Metaculus Is Building Next
ChristianWilliams
Oct 22, 2024, 4:35 PM
5
points
0
comments
LW
link
(www.metaculus.com)
Gorges of gender on a terrain of traits
dkl9
Oct 22, 2024, 4:18 PM
−7
points
1
comment
3
min read
LW
link
(dkl9.net)
A Defense of Peer Review
Niko_McCarty
and
delton137
Oct 22, 2024, 4:16 PM
23
points
1
comment
22
min read
LW
link
(www.asimov.press)
BIG-Bench Canary Contamination in GPT-4
Jozdien
Oct 22, 2024, 3:40 PM
125
points
14
comments
4
min read
LW
link
[Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF
Leon Lang
Oct 22, 2024, 1:57 PM
51
points
2
comments
18
min read
LW
link
(arxiv.org)
[Intuitive self-models] 6. Awakening / Enlightenment / PNSE
Steven Byrnes
Oct 22, 2024, 1:23 PM
64
points
8
comments
21
min read
LW
link
Resolving von Neumann-Morgenstern Inconsistent Preferences
niplav
Oct 22, 2024, 11:45 AM
38
points
5
comments
58
min read
LW
link
Lenses of Control
WillPetillo
Oct 22, 2024, 7:51 AM
14
points
0
comments
9
min read
LW
link
A Brief Explanation of AI Control
Aaron_Scher
Oct 22, 2024, 7:00 AM
8
points
1
comment
6
min read
LW
link
Longevity, AI, and Cognitive Research Hackathon @ MIT
ekkolápto
Oct 22, 2024, 6:19 AM
1
point
0
comments
1
min read
LW
link
Conversational Signposts—How to stop having boring social interactions
Declan Molony
Oct 22, 2024, 5:37 AM
11
points
6
comments
2
min read
LW
link
I got dysentery so you don’t have to
eukaryote
Oct 22, 2024, 4:55 AM
321
points
6
comments
17
min read
LW
link
(eukaryotewritesblog.com)
Transformers Explained (Again)
RohanS
Oct 22, 2024, 4:06 AM
4
points
0
comments
18
min read
LW
link
Sleeping on Stage
jefftk
Oct 22, 2024, 12:50 AM
26
points
3
comments
1
min read
LW
link
(www.jefftk.com)
The Mask Comes Off: At What Price?
Zvi
Oct 21, 2024, 11:50 PM
72
points
16
comments
8
min read
LW
link
(thezvi.wordpress.com)
Distinguishing ways AI can be “concentrated”
Matthew Barnett
Oct 21, 2024, 10:21 PM
28
points
2
comments
LW
link
Jailbreaking ChatGPT and Claude using Web API Context Injection
Jaehyuk Lim
Oct 21, 2024, 9:34 PM
4
points
0
comments
3
min read
LW
link
How to Teach Your Brain to Hate Procrastination
10xyz
Oct 21, 2024, 8:12 PM
3
points
0
comments
2
min read
LW
link
Pausing for what?
MountainPath
Oct 21, 2024, 8:12 PM
0
points
1
comment
1
min read
LW
link
What is autonomy? Why boundaries are necessary.
Chipmonk
Oct 21, 2024, 5:56 PM
8
points
1
comment
1
min read
LW
link
(chrislakin.blog)
Could randomly choosing people to serve as representatives lead to better government?
John Huang
Oct 21, 2024, 5:10 PM
75
points
13
comments
10
min read
LW
link
There aren’t enough smart people in biology doing something boring
Abhishaike Mahajan
Oct 21, 2024, 3:52 PM
27
points
13
comments
10
min read
LW
link
Automation collapse
Geoffrey Irving
,
Tomek Korbak
and
Benjamin Hilton
Oct 21, 2024, 2:50 PM
72
points
9
comments
7
min read
LW
link
What AI companies should do: Some rough ideas
Zach Stein-Perlman
Oct 21, 2024, 2:00 PM
33
points
10
comments
5
min read
LW
link
[Question]
What should OpenAI do that it hasn’t already done, to stop their vacancies from being advertised on the 80k Job Board?
WitheringWeights
Oct 21, 2024, 1:57 PM
22
points
0
comments
1
min read
LW
link
A Rocket–Interpretability Analogy
plex
Oct 21, 2024, 1:55 PM
155
points
31
comments
1
min read
LW
link
Tokyo AI Safety 2025: Call For Papers
Blaine
Oct 21, 2024, 8:43 AM
24
points
0
comments
3
min read
LW
link
(www.tais2025.cc)
OpenAI defected, but we can take honest actions
Remmelt
Oct 21, 2024, 8:41 AM
17
points
16
comments
LW
link
Slightly More Than You Wanted To Know: Pregnancy Length Effects
JustisMills
Oct 21, 2024, 1:26 AM
63
points
4
comments
5
min read
LW
link
(justismills.substack.com)
Information vs Assurance
johnswentworth
Oct 20, 2024, 11:16 PM
187
points
17
comments
2
min read
LW
link
Liquid vs Illiquid Careers
vaishnav92
Oct 20, 2024, 11:03 PM
35
points
7
comments
7
min read
LW
link
(vaishnavsunil.substack.com)
AI Can be “Gradient Aware” Without Doing Gradient hacking.
Sodium
Oct 20, 2024, 9:02 PM
20
points
0
comments
2
min read
LW
link
A brief theory of why we think things are good or bad
David Johnston
Oct 20, 2024, 8:31 PM
7
points
10
comments
LW
link
Thinking in 2D
sarahconstantin
Oct 20, 2024, 7:30 PM
27
points
0
comments
8
min read
LW
link
(sarahconstantin.substack.com)
Podcast discussing Hanson’s Cultural Drift Argument
vaishnav92
and
regan.arntz.gray
Oct 20, 2024, 5:58 PM
3
points
0
comments
1
min read
LW
link
(moralmayhem.substack.com)
Advice on Communicating Concisely
EvolutionByDesign
Oct 20, 2024, 4:45 PM
3
points
9
comments
1
min read
LW
link
Ambiguities or the issues we face with AI in medicine
Thehumanproject.ai
Oct 20, 2024, 4:45 PM
2
points
0
comments
5
min read
LW
link
The Personal Implications of AGI Realism
xizneb
20 Oct 2024 16:43 UTC
7
points
8
comments
5
min read
LW
link
Safety tax functions
owencb
20 Oct 2024 14:08 UTC
31
points
0
comments
6
min read
LW
link
(strangecities.substack.com)
Exploring the Platonic Representation Hypothesis Beyond In-Distribution Data
rokosbasilisk
20 Oct 2024 8:40 UTC
12
points
2
comments
1
min read
LW
link
Electoral Systems
RedFishBlueFish
20 Oct 2024 3:25 UTC
1
point
0
comments
14
min read
LW
link
Overcoming Bias Anthology
Arjun Panickssery
20 Oct 2024 2:01 UTC
169
points
14
comments
2
min read
LW
link
(overcoming-bias-anthology.com)
D/acc AI Security Salon
Allison Duettmann
19 Oct 2024 22:17 UTC
19
points
0
comments
1
min read
LW
link
Who Should Have Been Killed, and Contains Neato? Who Else Could It Be, but that Villain Magneto!
Ace Delgado
19 Oct 2024 20:39 UTC
−16
points
0
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel