Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural
Rubi J. Hudson
Jul 16, 2024, 10:44 PM
44
points
27
comments
5
min read
LW
link
Multiplex Gene Editing: Where Are We Now?
sarahconstantin
Jul 16, 2024, 8:50 PM
73
points
6
comments
7
min read
LW
link
(sarahconstantin.substack.com)
Recursion in AI is scary. But let’s talk solutions.
Oleg Trott
Jul 16, 2024, 8:34 PM
3
points
10
comments
2
min read
LW
link
How to wash your hands precisely and thoroughly
dkl9
Jul 16, 2024, 6:29 PM
12
points
0
comments
1
min read
LW
link
(dkl9.net)
Francois Chollet inadvertently limits his claim on ARC-AGI
Noosphere89
Jul 16, 2024, 5:32 PM
12
points
3
comments
1
min read
LW
link
(x.com)
Fully booked—LessWrong Community weekend
jt
Jul 16, 2024, 5:15 PM
20
points
2
comments
1
min read
LW
link
Boundless Emotion
GG10
Jul 16, 2024, 4:36 PM
3
points
0
comments
3
min read
LW
link
Mech Interp Lacks Good Paradigms
Daniel Tan
Jul 16, 2024, 3:47 PM
40
points
0
comments
14
min read
LW
link
DM Parenting
Shoshannah Tekofsky
Jul 16, 2024, 8:50 AM
50
points
4
comments
5
min read
LW
link
(kidquest.substack.com)
Apply now: Get “unstuck” with the New IFS Self-Care Fellowship Program
Inga G.
Jul 16, 2024, 8:18 AM
10
points
3
comments
LW
link
Why the Best Writers Endure Isolation
Declan Molony
Jul 16, 2024, 5:58 AM
49
points
6
comments
2
min read
LW
link
[Research log] The board of Alphabet would stop DeepMind to save the world
Lucie Philippon
Jul 16, 2024, 4:59 AM
6
points
0
comments
4
min read
LW
link
Towards more cooperative AI safety strategies
Richard_Ngo
Jul 16, 2024, 4:36 AM
215
points
133
comments
4
min read
LW
link
Why People in Poverty Make Bad Decisions
James Stephen Brown
Jul 15, 2024, 11:40 PM
5
points
8
comments
1
min read
LW
link
(nonzerosum.games)
Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution
Jeffrey Heninger
Jul 15, 2024, 9:30 PM
29
points
1
comment
4
min read
LW
link
(blog.aiimpacts.org)
MIRI’s July 2024 newsletter
Harlan
Jul 15, 2024, 9:28 PM
25
points
2
comments
1
min read
LW
link
(intelligence.org)
How (and why) to get tested for CMV
Metacelsus
Jul 15, 2024, 8:06 PM
18
points
0
comments
1
min read
LW
link
(denovo.substack.com)
A Better Hyperstition (for AI readers)
Yeshua God
Jul 15, 2024, 7:35 PM
−20
points
0
comments
119
min read
LW
link
I found >800 orthogonal “write code” steering vectors
Jacob G-W
and
TurnTrout
Jul 15, 2024, 7:06 PM
103
points
19
comments
7
min read
LW
link
(jacobgw.com)
The AI alignment problem in socio-technical systems from a computational perspective: A Top-Down-Top view and outlook
zhaoweizhang
Jul 15, 2024, 6:56 PM
3
points
0
comments
9
min read
LW
link
Musings of a Layman: Technology, AI, and the Human Condition
Crimson Liquidity
Jul 15, 2024, 6:40 PM
−2
points
0
comments
8
min read
LW
link
[Question]
Seeking feedback on a critique of the paperclip maximizer thought experiment
bio neural
Jul 15, 2024, 6:39 PM
3
points
9
comments
1
min read
LW
link
EAGxBerkeley 2024
Lauriander
Jul 15, 2024, 6:38 PM
3
points
0
comments
1
min read
LW
link
Against Aschenbrenner: How ‘Situational Awareness’ constructs a narrative that undermines safety and threatens humanity
GideonF
Jul 15, 2024, 6:37 PM
99
points
17
comments
21
min read
LW
link
(forum.effectivealtruism.org)
On predictability, chaos and AIs that don’t game our goals
Alejandro Tlaie
Jul 15, 2024, 5:16 PM
4
points
8
comments
6
min read
LW
link
Deceptive agents can collude to hide dangerous features in SAEs
Simon Lermen
and
Mateusz Dziemian
Jul 15, 2024, 5:07 PM
33
points
2
comments
7
min read
LW
link
Hiding in plain sight: the questions we don’t ask
DDthinker
Jul 15, 2024, 5:00 PM
−1
points
1
comment
26
min read
LW
link
Dialogue on What It Means For Something to Have A Function/Purpose
johnswentworth
,
Ramana Kumar
and
Steve Petersen
Jul 15, 2024, 4:28 PM
39
points
5
comments
16
min read
LW
link
Comparing Quantized Performance in Llama Models
NickyP
Jul 15, 2024, 4:01 PM
33
points
2
comments
8
min read
LW
link
[Aspiration-based designs] A. Damages from misaligned optimization – two more models
Jobst Heitzig
and
Simon Dima
Jul 15, 2024, 2:08 PM
6
points
0
comments
9
min read
LW
link
Stacked Laptop Monitor Update
jefftk
Jul 15, 2024, 9:40 AM
14
points
3
comments
1
min read
LW
link
(www.jefftk.com)
Misnaming and Other Issues with OpenAI’s “Human Level” Superintelligence Hierarchy
Davidmanheim
Jul 15, 2024, 5:50 AM
49
points
2
comments
3
min read
LW
link
Series on Artificial Wisdom
Jordan Arel
Jul 15, 2024, 1:11 AM
2
points
0
comments
3
min read
LW
link
Designing Artificial Wisdom: Decision Forecasting AI & Futarchy
Jordan Arel
Jul 15, 2024, 12:46 AM
1
point
1
comment
6
min read
LW
link
Risk Overview of AI in Bio Research
J Bostock
Jul 15, 2024, 12:04 AM
5
points
0
comments
5
min read
LW
link
(open.substack.com)
Donating to help Democrats win in the 2024 elections: research, decision support, and recommendations
Michael Cohn
Jul 14, 2024, 10:57 PM
−1
points
1
comment
6
min read
LW
link
Four ways I’ve made bad decisions
Sodium
Jul 14, 2024, 10:18 PM
18
points
1
comment
3
min read
LW
link
patent process problems
bhauth
Jul 14, 2024, 9:12 PM
33
points
13
comments
5
min read
LW
link
(www.bhauth.com)
Breaking Circuit Breakers
mikes
and
tbenthompson
Jul 14, 2024, 6:57 PM
53
points
13
comments
1
min read
LW
link
(confirmlabs.org)
Clopen sandwiches
dkl9
Jul 14, 2024, 1:07 PM
4
points
0
comments
1
min read
LW
link
(dkl9.net)
Child Handrail Returns
jefftk
Jul 14, 2024, 12:40 PM
12
points
0
comments
1
min read
LW
link
(www.jefftk.com)
A (paraconsistent) logic to deal with inconsistent preferences
B Jacobs
Jul 14, 2024, 11:17 AM
6
points
2
comments
4
min read
LW
link
(bobjacobs.substack.com)
Robert Caro And Mechanistic Models In Biography
adamShimi
14 Jul 2024 10:56 UTC
24
points
5
comments
7
min read
LW
link
(epistemologicalfascinations.substack.com)
An Introduction to Representation Engineering—an activation-based paradigm for controlling LLMs
Jan Wehner
14 Jul 2024 10:37 UTC
37
points
6
comments
17
min read
LW
link
LLMs as a Planning Overhang
Larks
14 Jul 2024 2:54 UTC
38
points
8
comments
2
min read
LW
link
Brief notes on the Wikipedia game
Olli Järviniemi
14 Jul 2024 2:28 UTC
68
points
9
comments
4
min read
LW
link
Spark in the Dark Guest Spots
jefftk
14 Jul 2024 1:40 UTC
6
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Ice: The Penultimate Frontier
Roko
13 Jul 2024 23:44 UTC
63
points
56
comments
1
min read
LW
link
(transhumanaxiology.substack.com)
Trust as a bottleneck to growing teams quickly
benkuhn
13 Jul 2024 18:00 UTC
44
points
3
comments
5
min read
LW
link
(www.benkuhn.net)
Stitching SAEs of different sizes
Bart Bussmann
,
Patrick Leask
,
Joseph Bloom
,
Curt Tigges
and
Neel Nanda
13 Jul 2024 17:19 UTC
39
points
12
comments
12
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel