Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
The rise of AI in cybercrime
BobyResearcher
Jul 30, 2023, 8:19 PM
−15
points
1
comment
2
min read
LW
link
(riseofAIincybercryme)
SSA vs. SIA: how future population may provide evidence for or against the foundations of political liberalism
j
Jul 30, 2023, 8:18 PM
−6
points
10
comments
55
min read
LW
link
Rationalization Maximizes Expected Value
Kevin Dorst
Jul 30, 2023, 8:11 PM
19
points
10
comments
7
min read
LW
link
(kevindorst.substack.com)
Apollo Neuro Results
Elizabeth
Jul 30, 2023, 6:40 PM
85
points
17
comments
3
min read
LW
link
(acesounderglass.com)
Hilbert’s Triumph, Church and Turing’s failure, and what it means (Post #2)
Noosphere89
Jul 30, 2023, 2:33 PM
−5
points
16
comments
15
min read
LW
link
[Question]
Specific Arguments against open source LLMs?
Iknownothing
Jul 30, 2023, 2:27 PM
4
points
2
comments
1
min read
LW
link
Socialism in large organizations
Adam Zerner
Jul 30, 2023, 7:25 AM
7
points
16
comments
2
min read
LW
link
How to make real-money prediction markets on arbitrary topics (Outdated)
yutaka
Jul 30, 2023, 2:11 AM
57
points
13
comments
3
min read
LW
link
[Question]
Does decidability of a theory imply completeness of the theory?
Noosphere89
Jul 29, 2023, 11:53 PM
6
points
12
comments
1
min read
LW
link
[Question]
If I showed the EQ-SQ theory’s findings to be due to measurement bias, would anyone change their minds about it?
tailcalled
Jul 29, 2023, 7:38 PM
23
points
13
comments
1
min read
LW
link
Self-driving car bets
paulfchristiano
Jul 29, 2023, 6:10 PM
236
points
44
comments
5
min read
LW
link
(sideways-view.com)
The Parable of the Dagger—The Animation
Writer
Jul 29, 2023, 2:03 PM
20
points
6
comments
1
min read
LW
link
(youtu.be)
Are Guitars Obsolete?
jefftk
Jul 29, 2023, 1:20 PM
11
points
8
comments
2
min read
LW
link
(www.jefftk.com)
NAMSI: A promising approach to alignment
Georgeo57
Jul 29, 2023, 7:03 AM
−6
points
6
comments
1
min read
LW
link
Understanding and Aligning a Human-like Inductive Bias with Cognitive Science: a Review of Related Literature
Claire Short
Jul 29, 2023, 6:10 AM
27
points
0
comments
12
min read
LW
link
Why You Should Never Update Your Beliefs
Arjun Panickssery
Jul 29, 2023, 12:27 AM
76
points
18
comments
4
min read
LW
link
1
review
(arjunpanickssery.substack.com)
Thoughts about the Mechanistic Interpretability Challenge #2 (EIS VII #2)
RGRGRG
Jul 28, 2023, 8:44 PM
24
points
5
comments
20
min read
LW
link
Because of LayerNorm, Directions in GPT-2 MLP Layers are Monosemantic
ojorgensen
Jul 28, 2023, 7:43 PM
13
points
3
comments
13
min read
LW
link
When can we trust model evaluations?
evhub
Jul 28, 2023, 7:42 PM
166
points
10
comments
10
min read
LW
link
1
review
Yes, It’s Subjective, But Why All The Crabs?
johnswentworth
Jul 28, 2023, 7:35 PM
250
points
15
comments
6
min read
LW
link
Semaglutide and Muscle
5hout
Jul 28, 2023, 6:36 PM
15
points
14
comments
5
min read
LW
link
Double Crux in a Box
Screwtape
Jul 28, 2023, 5:55 PM
8
points
3
comments
1
min read
LW
link
Gradient descent might see the direction of the optimum from far away
Mikhail Samin
Jul 28, 2023, 4:19 PM
70
points
13
comments
4
min read
LW
link
Progress links digest, 2023-07-28: The decadent opulence of modern capitalism
jasoncrawford
Jul 28, 2023, 2:36 PM
16
points
3
comments
3
min read
LW
link
(rootsofprogress.org)
AI Awareness through Interaction with Blatantly Alien Models
VojtaKovarik
Jul 28, 2023, 8:41 AM
7
points
5
comments
3
min read
LW
link
You don’t get to have cool flaws
Neil
Jul 28, 2023, 5:37 AM
78
points
25
comments
2
min read
LW
link
3
reviews
Reducing sycophancy and improving honesty via activation steering
Nina Panickssery
Jul 28, 2023, 2:46 AM
122
points
18
comments
9
min read
LW
link
1
review
Mech Interp Puzzle 2: Word2Vec Style Embeddings
Neel Nanda
Jul 28, 2023, 12:50 AM
41
points
4
comments
2
min read
LW
link
ETFE windows
bhauth
Jul 28, 2023, 12:46 AM
31
points
4
comments
2
min read
LW
link
(www.bhauth.com)
A Short Memo on AI Interpretability Rainbows
scasper
Jul 27, 2023, 11:05 PM
18
points
0
comments
2
min read
LW
link
Pulling the Rope Sideways: Empirical Test Results
Daniel Kokotajlo
Jul 27, 2023, 10:18 PM
61
points
18
comments
1
min read
LW
link
A $10k retroactive grant for VaccinateCA
Austin Chen
Jul 27, 2023, 6:14 PM
82
points
0
comments
LW
link
(manifund.org)
Preference Aggregation as Bayesian Inference
beren
Jul 27, 2023, 5:59 PM
14
points
1
comment
1
min read
LW
link
AI #22: Into the Weeds
Zvi
Jul 27, 2023, 5:40 PM
49
points
8
comments
84
min read
LW
link
(thezvi.wordpress.com)
SSA rejects anthropic shadow, too
jessicata
Jul 27, 2023, 5:25 PM
74
points
38
comments
11
min read
LW
link
(unstableontology.com)
[Question]
What are examples of someone doing a lot of work to find the best of something?
chanamessinger
Jul 27, 2023, 3:58 PM
29
points
16
comments
1
min read
LW
link
AI-Plans.com 10-day Critique-a-Thon
Iknownothing
Jul 27, 2023, 11:44 AM
8
points
2
comments
2
min read
LW
link
(manifund.org)
Privacy in a Digital World
Faustify
Jul 27, 2023, 10:46 AM
2
points
0
comments
5
min read
LW
link
Cultivating a state of mind where new ideas are born
Henrik Karlsson
Jul 27, 2023, 9:16 AM
244
points
21
comments
14
min read
LW
link
2
reviews
(www.henrikkarlsson.xyz)
Partial Transcript of Recent Senate Hearing Discussing AI X-Risk
Daniel_Eth
Jul 27, 2023, 9:16 AM
55
points
0
comments
LW
link
(medium.com)
AXRP Episode 24 - Superalignment with Jan Leike
DanielFilan
Jul 27, 2023, 4:00 AM
55
points
3
comments
69
min read
LW
link
[Question]
Have you ever considered taking the ‘Turing Test’ yourself?
Super AGI
Jul 27, 2023, 3:48 AM
2
points
6
comments
1
min read
LW
link
AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu
DanielFilan
27 Jul 2023 1:50 UTC
22
points
0
comments
72
min read
LW
link
GPT-4 can catch subtle cross-language translation mistakes
Michael Tontchev
27 Jul 2023 1:39 UTC
7
points
1
comment
1
min read
LW
link
Social Balance through Embracing Social Credit
dhruvv
26 Jul 2023 20:07 UTC
−39
points
9
comments
3
min read
LW
link
Why no Roman Industrial Revolution?
jasoncrawford
26 Jul 2023 19:34 UTC
62
points
30
comments
3
min read
LW
link
(rootsofprogress.org)
Why you can’t treat decidability and complexity as a constant (Post #1)
Noosphere89
26 Jul 2023 17:54 UTC
6
points
13
comments
5
min read
LW
link
A response to the Richards et al.’s “The Illusion of AI’s Existential Risk”
Harrison Fell
26 Jul 2023 17:34 UTC
1
point
0
comments
10
min read
LW
link
Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Buck
and
ryan_greenblatt
26 Jul 2023 17:02 UTC
99
points
19
comments
1
min read
LW
link
1
review
Neuronpedia
Johnny Lin
26 Jul 2023 16:29 UTC
135
points
51
comments
2
min read
LW
link
(neuronpedia.org)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel