Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Experimentation (Part 7 of “The Sense Of Physical Necessity”)
LoganStrohl
Mar 18, 2024, 9:25 PM
33
points
0
comments
10
min read
LW
link
INTERVIEW: Round 2 - StakeOut.AI w/ Dr. Peter Park
jacobhaimes
Mar 18, 2024, 9:21 PM
5
points
0
comments
1
min read
LW
link
(into-ai-safety.github.io)
Neuroscience and Alignment
Garrett Baker
Mar 18, 2024, 9:09 PM
40
points
25
comments
2
min read
LW
link
GPT, the magical collaboration zone, Lex Fridman and Sam Altman
Bill Benzon
Mar 18, 2024, 8:04 PM
3
points
1
comment
3
min read
LW
link
Measuring Coherence of Policies in Toy Environments
dx26
and
Richard_Ngo
Mar 18, 2024, 5:59 PM
59
points
9
comments
14
min read
LW
link
AtP*: An efficient and scalable method for localizing LLM behaviour to components
Neel Nanda
,
János Kramár
,
Tom Lieberum
and
Rohin Shah
Mar 18, 2024, 5:28 PM
19
points
0
comments
1
min read
LW
link
(arxiv.org)
Community Notes by X
NicholasKees
Mar 18, 2024, 5:13 PM
127
points
15
comments
7
min read
LW
link
[Question]
Is the Basilisk pretending to be hidden in this simulation so that it can check what I would do if conditioned by a world without the Basilisk?
maybefbi
Mar 18, 2024, 4:05 PM
−18
points
1
comment
1
min read
LW
link
On Devin
Zvi
Mar 18, 2024, 1:20 PM
148
points
34
comments
11
min read
LW
link
(thezvi.wordpress.com)
RLLMv10 experiment
MiguelDev
Mar 18, 2024, 8:32 AM
5
points
0
comments
2
min read
LW
link
Join the AI Evaluation Tasks Bounty Hackathon
Esben Kran
Mar 18, 2024, 8:15 AM
12
points
1
comment
LW
link
5 Physics Problems
DaemonicSigil
and
Muireall
Mar 18, 2024, 8:05 AM
60
points
0
comments
15
min read
LW
link
Inferring the model dimension of API-protected LLMs
Ege Erdil
Mar 18, 2024, 6:19 AM
34
points
3
comments
4
min read
LW
link
(arxiv.org)
AI strategy given the need for good reflection
owencb
Mar 18, 2024, 12:48 AM
7
points
0
comments
LW
link
XAI releases Grok base model
Jacob G-W
Mar 18, 2024, 12:47 AM
11
points
3
comments
1
min read
LW
link
(x.ai)
Toki pona FAQ
dkl9
Mar 17, 2024, 9:44 PM
37
points
9
comments
1
min read
LW
link
(dkl9.net)
EA ErFiN Project work
Max_He-Ho
Mar 17, 2024, 8:42 PM
2
points
0
comments
1
min read
LW
link
EA ErFiN Project work
Max_He-Ho
Mar 17, 2024, 8:37 PM
2
points
0
comments
1
min read
LW
link
[Question]
Alice and Bob is debating on a technique. Alice says Bob should try it before denying it. Is it a fallacy or something similar?
Ooker
Mar 17, 2024, 8:01 PM
0
points
19
comments
2
min read
LW
link
Is there a way to calculate the P(we are in a 2nd cold war)?
cloak
Mar 17, 2024, 8:01 PM
−9
points
2
comments
1
min read
LW
link
The Worst Form Of Government (Except For Everything Else We’ve Tried)
johnswentworth
Mar 17, 2024, 6:11 PM
135
points
47
comments
4
min read
LW
link
Applying simulacrum levels to hobbies, interests and goals
DMMF
Mar 17, 2024, 4:18 PM
15
points
2
comments
4
min read
LW
link
(danfrank.ca)
What is the best argument that LLMs are shoggoths?
JoshuaFox
Mar 17, 2024, 11:36 AM
26
points
22
comments
1
min read
LW
link
Invitation to the Princeton AI Alignment and Safety Seminar
Sadhika Malladi
Mar 17, 2024, 1:10 AM
6
points
1
comment
1
min read
LW
link
Anxiety vs. Depression
Sable
Mar 17, 2024, 12:15 AM
86
points
35
comments
3
min read
LW
link
(affablyevil.substack.com)
Celiefs
TheLemmaLlama
Mar 16, 2024, 11:56 PM
3
points
8
comments
1
min read
LW
link
My PhD thesis: Algorithmic Bayesian Epistemology
Eric Neyman
Mar 16, 2024, 10:56 PM
262
points
14
comments
7
min read
LW
link
(arxiv.org)
How people stopped dying from diarrhea so much (& other life-saving decisions)
Writer
Mar 16, 2024, 4:00 PM
45
points
0
comments
LW
link
(youtu.be)
Transformative trustbuilding via advancements in decentralized lie detection
trevor
Mar 16, 2024, 5:56 AM
20
points
10
comments
38
min read
LW
link
(www.ncbi.nlm.nih.gov)
Enter the WorldsEnd
Akram Choudhary
Mar 16, 2024, 1:34 AM
−25
points
8
comments
1
min read
LW
link
Strong-Misalignment: Does Yudkowsky (or Christiano, or TurnTrout, or Wolfram, or…etc.) Have an Elevator Speech I’m Missing?
Benjamin Bourlier
Mar 15, 2024, 11:17 PM
−4
points
3
comments
16
min read
LW
link
Introducing METR’s Autonomy Evaluation Resources
Megan Kinniment
and
Beth Barnes
Mar 15, 2024, 11:16 PM
90
points
0
comments
1
min read
LW
link
(metr.github.io)
Are AIs conscious? It might depend
Logan Zoellner
Mar 15, 2024, 11:09 PM
6
points
6
comments
3
min read
LW
link
Beyond Maxipok — good reflective governance as a target for action
owencb
Mar 15, 2024, 10:22 PM
20
points
0
comments
LW
link
Middle Child Phenomenon
PhilosophicalSoul
Mar 15, 2024, 8:47 PM
3
points
3
comments
2
min read
LW
link
Capability or Alignment? Respect the LLM Base Model’s Capability During Alignment
Jingfeng Yang
Mar 15, 2024, 5:56 PM
7
points
0
comments
24
min read
LW
link
Rational Animations offers animation production and writing services!
Writer
Mar 15, 2024, 5:26 PM
33
points
0
comments
1
min read
LW
link
Improving SAE’s by Sqrt()-ing L1 & Removing Lowest Activating Features
Logan Riggs
and
Jannik Brinkmann
Mar 15, 2024, 4:30 PM
26
points
5
comments
4
min read
LW
link
Stuttgart, Germany—ACX Spring Meetups Everywhere 2024
Benjamin R
Mar 15, 2024, 2:59 PM
2
points
1
comment
1
min read
LW
link
Controlling AGI Risk
TeaSea
Mar 15, 2024, 4:56 AM
6
points
8
comments
4
min read
LW
link
Ulm, Germany—ACX Spring Meetups Everywhere 2024
Benjamin R
Mar 15, 2024, 1:32 AM
2
points
1
comment
1
min read
LW
link
Newport News/ Virginia ACX Meetup
Daniel
Mar 14, 2024, 11:46 PM
1
point
0
comments
1
min read
LW
link
Constructive Cauchy sequences vs. Dedekind cuts
jessicata
Mar 14, 2024, 11:04 PM
47
points
23
comments
4
min read
LW
link
(unstableontology.com)
A Nail in the Coffin of Exceptionalism
Yeshua God
Mar 14, 2024, 10:41 PM
−17
points
0
comments
3
min read
LW
link
Toward a Broader Conception of Adverse Selection
Ricki Heicklen
Mar 14, 2024, 10:40 PM
177
points
61
comments
13
min read
LW
link
(bayesshammai.substack.com)
More people getting into AI safety should do a PhD
AdamGleave
Mar 14, 2024, 10:14 PM
61
points
24
comments
12
min read
LW
link
(gleave.me)
Collection (Part 6 of “The Sense Of Physical Necessity”)
LoganStrohl
Mar 14, 2024, 9:37 PM
28
points
0
comments
8
min read
LW
link
Fixed point or oscillate or noise
lemonhope
Mar 14, 2024, 6:37 PM
3
points
10
comments
1
min read
LW
link
How useful is “AI Control” as a framing on AI X-Risk?
habryka
and
ryan_greenblatt
Mar 14, 2024, 6:06 PM
70
points
4
comments
34
min read
LW
link
Sparse autoencoders find composed features in small toy models
Evan Anders
,
Clement Neo
,
Jason Hoelscher-Obermaier
and
Jessica N. Howard
Mar 14, 2024, 6:00 PM
33
points
12
comments
15
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel