Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
MIT FutureTech are hiring for a Technical Associate role
peterslattery
Sep 9, 2024, 8:16 PM
3
points
0
comments
3
min read
LW
link
AI forecasting bots incoming
Dan H
and
Mantas Mazeika
Sep 9, 2024, 7:14 PM
29
points
44
comments
4
min read
LW
link
(www.safe.ai)
My takes on SB-1047
leogao
Sep 9, 2024, 6:38 PM
151
points
8
comments
4
min read
LW
link
[Question]
Building an Inexpensive, Aesthetic, Private Forum
Aaron Graifman
Sep 9, 2024, 5:10 PM
13
points
15
comments
1
min read
LW
link
[Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)
Fernando Avalos
Sep 9, 2024, 3:33 AM
6
points
1
comment
1
min read
LW
link
(forum.effectivealtruism.org)
[Question]
Has Anyone Here Consciously Changed Their Passions?
Spade
Sep 9, 2024, 1:36 AM
11
points
12
comments
1
min read
LW
link
Pollsters Should Publish Question Translations
jefftk
Sep 8, 2024, 10:10 PM
60
points
3
comments
2
min read
LW
link
(www.jefftk.com)
On Fables and Nuanced Charts
Niko_McCarty
Sep 8, 2024, 5:09 PM
35
points
2
comments
8
min read
LW
link
(www.asimov.press)
Contra Yudkowsky on 2-4-6 Game Difficulty Explanations
Josh Hickman
Sep 8, 2024, 4:13 PM
6
points
1
comment
2
min read
LW
link
(xn--2r8hmb.ws)
Attachment THEORY AND THE EFFECTS OF SECURE ATTACHMENT ON CHILD DEVELOPMENT
Mihriban Temel
Sep 8, 2024, 4:09 PM
−8
points
0
comments
9
min read
LW
link
Fictional parasites very different from our own
Abhishaike Mahajan
Sep 8, 2024, 2:59 PM
25
points
0
comments
4
min read
LW
link
(www.owlposting.com)
My Number 1 Epistemology Book Recommendation: Inventing Temperature
adamShimi
Sep 8, 2024, 2:30 PM
121
points
18
comments
3
min read
LW
link
(epistemologicalfascinations.substack.com)
[Question]
I want a good multi-LLM API-powered chatbot
rotatingpaguro
Sep 8, 2024, 9:40 AM
10
points
5
comments
1
min read
LW
link
That Alien Message—The Animation
Writer
Sep 7, 2024, 2:53 PM
144
points
10
comments
8
min read
LW
link
(youtu.be)
Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps
Daniel C
Sep 7, 2024, 10:04 AM
19
points
18
comments
2
min read
LW
link
(x.com)
Pay Risk Evaluators in Cash, Not Equity
Adam Scholl
Sep 7, 2024, 2:37 AM
214
points
19
comments
1
min read
LW
link
Excerpts from “A Reader’s Manifesto”
Arjun Panickssery
Sep 6, 2024, 10:37 PM
72
points
1
comment
13
min read
LW
link
(arjunpanickssery.substack.com)
Fun With CellxGene
sarahconstantin
Sep 6, 2024, 10:00 PM
30
points
2
comments
7
min read
LW
link
(sarahconstantin.substack.com)
[Question]
Is this voting system strategy proof?
Donald Hobson
Sep 6, 2024, 8:44 PM
17
points
9
comments
1
min read
LW
link
Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream
Diego Caples
and
rrenaud
Sep 6, 2024, 5:55 PM
70
points
7
comments
4
min read
LW
link
Backdoors as an analogy for deceptive alignment
Jacob_Hilton
and
Mark Xu
Sep 6, 2024, 3:30 PM
104
points
2
comments
8
min read
LW
link
(www.alignment.org)
A Cable Holder for 2 Cent
Johannes C. Mayer
Sep 6, 2024, 11:01 AM
1
point
1
comment
1
min read
LW
link
Perhaps Try a Little Therapy, As a Treat?
segfault
Sep 6, 2024, 8:51 AM
−187
points
61
comments
16
min read
LW
link
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Daniel Lee
and
StefanHex
Sep 6, 2024, 2:28 AM
28
points
0
comments
12
min read
LW
link
Distinguish worst-case analysis from instrumental training-gaming
Olli Järviniemi
and
Buck
Sep 5, 2024, 7:13 PM
38
points
0
comments
5
min read
LW
link
AI x Human Flourishing: Introducing the Cosmos Institute
Brendan McCord
Sep 5, 2024, 6:23 PM
14
points
5
comments
6
min read
LW
link
(cosmosinstitute.substack.com)
What is SB 1047 *for*?
Raemon
Sep 5, 2024, 5:39 PM
61
points
8
comments
3
min read
LW
link
instruction tuning and autoregressive distribution shift
nostalgebraist
Sep 5, 2024, 4:53 PM
40
points
5
comments
5
min read
LW
link
Conflating value alignment and intent alignment is causing confusion
Seth Herd
Sep 5, 2024, 4:39 PM
49
points
18
comments
5
min read
LW
link
A bet for Samo Burja
Nathan Helm-Burger
Sep 5, 2024, 4:01 PM
14
points
2
comments
2
min read
LW
link
Universal basic income isn’t always AGI-proof
Kevin Kohler
Sep 5, 2024, 3:39 PM
5
points
3
comments
7
min read
LW
link
(machinocene.substack.com)
Why Reflective Stability is Important
Johannes C. Mayer
Sep 5, 2024, 3:28 PM
19
points
2
comments
1
min read
LW
link
Why Swiss watches and Taylor Swift are AGI-proof
Kevin Kohler
Sep 5, 2024, 1:23 PM
18
points
11
comments
6
min read
LW
link
(machinocene.substack.com)
Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth?
Alexander de Vries
Sep 5, 2024, 10:23 AM
7
points
20
comments
10
min read
LW
link
(2ndhandecon.substack.com)
What program structures enable efficient induction?
Daniel C
Sep 5, 2024, 10:12 AM
23
points
5
comments
3
min read
LW
link
How to Fake Decryption
ohmurphy
Sep 5, 2024, 9:18 AM
12
points
0
comments
4
min read
LW
link
(ohmurphy.substack.com)
We Should Try to Directly Measure the Value of Scientific Papers
ohmurphy
Sep 5, 2024, 9:08 AM
1
point
0
comments
5
min read
LW
link
(ohmurphy.substack.com)
on Science Beakers and DDT
bhauth
Sep 5, 2024, 3:21 AM
23
points
13
comments
9
min read
LW
link
(bhauth.com)
Massive Activations and why <bos> is important in Tokenized SAE Unigrams
Louka Ewington-Pitsos
Sep 5, 2024, 2:19 AM
1
point
0
comments
3
min read
LW
link
The Forging of the Great Minds: An Unfinished Tale
Aryeh Englander
Sep 5, 2024, 12:58 AM
−3
points
0
comments
5
min read
LW
link
The Chatbot of Babble
Aryeh Englander
Sep 5, 2024, 12:56 AM
−3
points
0
comments
7
min read
LW
link
[Question]
Is it Legal to Maintain Turing Tests using Data Poisoning, and would it work?
Double
5 Sep 2024 0:35 UTC
8
points
9
comments
1
min read
LW
link
Executable philosophy as a failed totalizing meta-worldview
jessicata
4 Sep 2024 22:50 UTC
93
points
40
comments
4
min read
LW
link
(unstableontology.com)
Against Explosive Growth
c.trout
4 Sep 2024 21:45 UTC
14
points
1
comment
5
min read
LW
link
The Fragility of Life Hypothesis and the Evolution of Cooperation
KristianRonn
4 Sep 2024 21:04 UTC
50
points
6
comments
11
min read
LW
link
Emotion-Informed Valuation Mechanism for Improved AI Alignment in Large Language Models
Javier Marin Valenzuela
4 Sep 2024 17:00 UTC
2
points
4
comments
6
min read
LW
link
What happens if you present 500 people with an argument that AI is risky?
KatjaGrace
and
Nathan Young
4 Sep 2024 16:40 UTC
110
points
8
comments
3
min read
LW
link
(blog.aiimpacts.org)
Automating LLM Auditing with Developmental Interpretability
htlou
and
evhub
4 Sep 2024 15:50 UTC
19
points
0
comments
3
min read
LW
link
Michael Dickens’ Caffeine Tolerance Research
niplav
4 Sep 2024 15:41 UTC
46
points
5
comments
2
min read
LW
link
(mdickens.me)
[Question]
Are UV-C Air purifiers so useful?
SebastianG
4 Sep 2024 14:16 UTC
9
points
0
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel