Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Ablations for “Frontier Models are Capable of In-context Scheming”
AlexMeinke
,
Bronson Schoen
,
Marius Hobbhahn
,
Mikita Balesni
,
Jérémy Scheurer
and
rusheb
Dec 17, 2024, 11:58 PM
115
points
1
comment
2
min read
LW
link
Careless thinking: A theory of bad thinking
Nathan Young
Dec 17, 2024, 6:23 PM
49
points
17
comments
9
min read
LW
link
(nathanpmyoung.substack.com)
The Second Gemini
Zvi
Dec 17, 2024, 3:50 PM
23
points
0
comments
11
min read
LW
link
(thezvi.wordpress.com)
AIS Hungary is hiring a part-time Technical Lead! (Deadline: Dec 31st)
gergogaspar
Dec 17, 2024, 2:12 PM
1
point
0
comments
2
min read
LW
link
Everything you care about is in the map
Tahp
Dec 17, 2024, 2:05 PM
17
points
27
comments
3
min read
LW
link
Reality is Fractal-Shaped
silentbob
Dec 17, 2024, 1:52 PM
18
points
1
comment
8
min read
LW
link
Trying to translate when people talk past each other
Kaj_Sotala
Dec 17, 2024, 9:40 AM
41
points
12
comments
6
min read
LW
link
(kajsotala.fi)
What is “wireheading”?
Vishakha
and
Algon
Dec 17, 2024, 7:49 AM
10
points
0
comments
1
min read
LW
link
(aisafety.info)
1 What If We Rebuild Motivation with the Fermi ESTIMATion?
P. João
Dec 17, 2024, 7:46 AM
6
points
0
comments
3
min read
LW
link
Where do you put your ideas?
CstineSublime
Dec 17, 2024, 7:26 AM
9
points
20
comments
1
min read
LW
link
Elevating Air Purifiers
jefftk
Dec 17, 2024, 1:40 AM
25
points
0
comments
1
min read
LW
link
(www.jefftk.com)
A dataset of questions on decision-theoretic reasoning in Newcomb-like problems
Caspar Oesterheld
,
Ethan Perez
and
Chi Nguyen
Dec 16, 2024, 10:42 PM
49
points
1
comment
2
min read
LW
link
(arxiv.org)
A practical guide to tiling the universe with hedonium
Vittu Perkele
Dec 16, 2024, 9:25 PM
−8
points
1
comment
1
min read
LW
link
(perkeleperusing.substack.com)
AI Safety Seed Funding Network—Join as a Donor or Investor
Alexandra Bos
Dec 16, 2024, 7:30 PM
30
points
0
comments
LW
link
Is this a better way to do matchmaking?
Chipmonk
Dec 16, 2024, 7:06 PM
9
points
4
comments
1
min read
LW
link
I read every major AI lab’s safety plan so you don’t have to
sarahhw
Dec 16, 2024, 6:51 PM
20
points
0
comments
12
min read
LW
link
(longerramblings.substack.com)
Grokking revisited: reverse engineering grokking modulo addition in LSTM
Nikita Khomich
and
Danik
Dec 16, 2024, 6:48 PM
4
points
0
comments
6
min read
LW
link
Progress links and short notes, 2024-12-16
jasoncrawford
Dec 16, 2024, 5:24 PM
7
points
0
comments
2
min read
LW
link
(newsletter.rootsofprogress.org)
Effective Altruism FAQ
omnizoid
Dec 16, 2024, 4:27 PM
0
points
7
comments
12
min read
LW
link
Variably compressibly studies are fun
dkl9
Dec 16, 2024, 4:00 PM
0
points
0
comments
2
min read
LW
link
(dkl9.net)
AIs Will Increasingly Attempt Shenanigans
Zvi
Dec 16, 2024, 3:20 PM
114
points
2
comments
26
min read
LW
link
(thezvi.wordpress.com)
Testing which LLM architectures can do hidden serial reasoning
Filip Sondej
Dec 16, 2024, 1:48 PM
81
points
9
comments
4
min read
LW
link
NeuroAI for AI safety: A Differential Path
nz
and
Patrick Mineault
Dec 16, 2024, 1:17 PM
22
points
0
comments
7
min read
LW
link
(arxiv.org)
Circling as practice for “just be yourself”
Kaj_Sotala
Dec 16, 2024, 7:40 AM
86
points
5
comments
4
min read
LW
link
(kajsotala.fi)
Reanalyzing the 2023 Expert Survey on Progress in AI
AI Impacts
Dec 16, 2024, 6:10 AM
8
points
0
comments
1
min read
LW
link
(blog.aiimpacts.org)
Ideas for benchmarking LLM creativity
gwern
Dec 16, 2024, 5:18 AM
60
points
11
comments
1
min read
LW
link
(gwern.net)
Comparing the AirFanta 3Pro to the Coway AP-1512
jefftk
Dec 16, 2024, 1:40 AM
13
points
0
comments
1
min read
LW
link
(www.jefftk.com)
[Question]
are IQ tests a good measure of intelligence?
KvmanThinking
Dec 15, 2024, 11:06 PM
0
points
5
comments
1
min read
LW
link
Madison Secular Solstice
svfritz
Dec 15, 2024, 9:52 PM
1
point
0
comments
1
min read
LW
link
[Question]
Is AI alignment a purely functional property?
Roko
Dec 15, 2024, 9:42 PM
13
points
8
comments
1
min read
LW
link
[Question]
How counterfactual are logical counterfactuals?
Donald Hobson
Dec 15, 2024, 9:16 PM
11
points
10
comments
1
min read
LW
link
Debunking the myth of safe AI
henophilia
Dec 15, 2024, 5:44 PM
−11
points
8
comments
1
min read
LW
link
(henophilia.substack.com)
Introducing Avatarism: A Rational Framework for Building actual Heaven
ratiba ro
Dec 15, 2024, 5:17 PM
2
points
2
comments
2
min read
LW
link
A Public Choice Take on Effective Altruism
vaishnav92
Dec 15, 2024, 4:58 PM
9
points
4
comments
3
min read
LW
link
(www.optimaloutliers.com)
World Models I’m Currently Building
temporary
Dec 15, 2024, 4:29 PM
5
points
1
comment
1
min read
LW
link
(samuelshadrach.com)
Dress Up For Secular Solstice
Gordon H.S.
Dec 15, 2024, 4:28 PM
33
points
13
comments
7
min read
LW
link
Remap your caps lock key
bilalchughtai
Dec 15, 2024, 2:03 PM
80
points
18
comments
1
min read
LW
link
Effective Evil’s AI Misalignment Plan
lsusr
Dec 15, 2024, 7:39 AM
83
points
9
comments
3
min read
LW
link
Write Good Enough Code, Quickly
Oliver Daniels
Dec 15, 2024, 4:45 AM
19
points
10
comments
8
min read
LW
link
How to Edit an Essay into a Solstice Speech?
Czynski
Dec 15, 2024, 4:30 AM
5
points
1
comment
1
min read
LW
link
(thepdv.wordpress.com)
How Your Physiology Affects the Mind’s Projection Fallacy
YanLyutnev
Dec 14, 2024, 9:10 PM
−1
points
0
comments
6
min read
LW
link
Introducing the Evidence Color Wheel
Larry Lee
Dec 14, 2024, 4:08 PM
6
points
0
comments
3
min read
LW
link
An Illustrated Summary of “Robust Agents Learn Causal World Model”
Dalcy
Dec 14, 2024, 3:02 PM
67
points
2
comments
10
min read
LW
link
Best-of-N Jailbreaking
John Hughes
,
saraprice
,
Aengus Lynch
,
Rylan Schaeffer
,
Fazl
,
Henry Sleight
,
Ethan Perez
and
mrinank_sharma
Dec 14, 2024, 4:58 AM
78
points
5
comments
2
min read
LW
link
(arxiv.org)
D&D.Sci Dungeonbuilding: the Dungeon Tournament
aphyer
Dec 14, 2024, 4:30 AM
49
points
16
comments
3
min read
LW
link
Creating Interpretable Latent Spaces with Gradient Routing
Jacob G-W
Dec 14, 2024, 4:00 AM
26
points
6
comments
2
min read
LW
link
(jacobgw.com)
Probability of death by suicide by a 26 year old
John Wiseman
Dec 14, 2024, 3:33 AM
−25
points
4
comments
1
min read
LW
link
Matryoshka Sparse Autoencoders
Noa Nabeshima
Dec 14, 2024, 2:52 AM
98
points
15
comments
11
min read
LW
link
[Question]
What is MIRI currently doing?
Roko
14 Dec 2024 2:39 UTC
32
points
14
comments
1
min read
LW
link
The o1 System Card Is Not About o1
Zvi
13 Dec 2024 20:30 UTC
116
points
5
comments
16
min read
LW
link
(thezvi.wordpress.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel