Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
A Short Memo on AI Interpretability Rainbows
scasper
Jul 27, 2023, 11:05 PM
18
points
0
comments
2
min read
LW
link
Pulling the Rope Sideways: Empirical Test Results
Daniel Kokotajlo
Jul 27, 2023, 10:18 PM
61
points
18
comments
1
min read
LW
link
A $10k retroactive grant for VaccinateCA
Austin Chen
Jul 27, 2023, 6:14 PM
82
points
0
comments
LW
link
(manifund.org)
Preference Aggregation as Bayesian Inference
beren
Jul 27, 2023, 5:59 PM
14
points
1
comment
1
min read
LW
link
AI #22: Into the Weeds
Zvi
Jul 27, 2023, 5:40 PM
49
points
8
comments
84
min read
LW
link
(thezvi.wordpress.com)
SSA rejects anthropic shadow, too
jessicata
Jul 27, 2023, 5:25 PM
74
points
38
comments
11
min read
LW
link
(unstableontology.com)
[Question]
What are examples of someone doing a lot of work to find the best of something?
chanamessinger
Jul 27, 2023, 3:58 PM
29
points
16
comments
1
min read
LW
link
AI-Plans.com 10-day Critique-a-Thon
Iknownothing
Jul 27, 2023, 11:44 AM
8
points
2
comments
2
min read
LW
link
(manifund.org)
Privacy in a Digital World
Faustify
Jul 27, 2023, 10:46 AM
2
points
0
comments
5
min read
LW
link
Cultivating a state of mind where new ideas are born
Henrik Karlsson
Jul 27, 2023, 9:16 AM
244
points
21
comments
14
min read
LW
link
2
reviews
(www.henrikkarlsson.xyz)
Partial Transcript of Recent Senate Hearing Discussing AI X-Risk
Daniel_Eth
Jul 27, 2023, 9:16 AM
55
points
0
comments
LW
link
(medium.com)
AXRP Episode 24 - Superalignment with Jan Leike
DanielFilan
Jul 27, 2023, 4:00 AM
55
points
3
comments
69
min read
LW
link
[Question]
Have you ever considered taking the ‘Turing Test’ yourself?
Super AGI
Jul 27, 2023, 3:48 AM
2
points
6
comments
1
min read
LW
link
AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu
DanielFilan
Jul 27, 2023, 1:50 AM
22
points
0
comments
72
min read
LW
link
GPT-4 can catch subtle cross-language translation mistakes
Michael Tontchev
Jul 27, 2023, 1:39 AM
7
points
1
comment
1
min read
LW
link
Social Balance through Embracing Social Credit
dhruvv
Jul 26, 2023, 8:07 PM
−39
points
9
comments
3
min read
LW
link
Why no Roman Industrial Revolution?
jasoncrawford
Jul 26, 2023, 7:34 PM
62
points
30
comments
3
min read
LW
link
(rootsofprogress.org)
Why you can’t treat decidability and complexity as a constant (Post #1)
Noosphere89
Jul 26, 2023, 5:54 PM
6
points
13
comments
5
min read
LW
link
A response to the Richards et al.’s “The Illusion of AI’s Existential Risk”
Harrison Fell
Jul 26, 2023, 5:34 PM
1
point
0
comments
10
min read
LW
link
Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Buck
and
ryan_greenblatt
Jul 26, 2023, 5:02 PM
100
points
19
comments
1
min read
LW
link
1
review
Neuronpedia
Johnny Lin
Jul 26, 2023, 4:29 PM
135
points
51
comments
2
min read
LW
link
(neuronpedia.org)
Frontier Model Forum
Zach Stein-Perlman
Jul 26, 2023, 2:30 PM
27
points
0
comments
4
min read
LW
link
(blog.google)
Podcasts: Future of Life Institute, Breakthrough Science Summit panel
jasoncrawford
Jul 26, 2023, 2:28 PM
8
points
0
comments
1
min read
LW
link
(rootsofprogress.org)
Llama We Doing This Again?
Zvi
Jul 26, 2023, 1:00 PM
48
points
3
comments
16
min read
LW
link
(thezvi.wordpress.com)
Frontier Model Security
Vaniver
Jul 26, 2023, 4:48 AM
32
points
1
comment
3
min read
LW
link
(www.anthropic.com)
The First Room-Temperature Ambient-Pressure Superconductor
Annapurna
Jul 26, 2023, 2:27 AM
35
points
28
comments
1
min read
LW
link
(arxiv.org)
Underwater Torture Chambers: The Horror Of Fish Farming
Bentham's Bulldog
Jul 26, 2023, 12:27 AM
83
points
50
comments
10
min read
LW
link
1
review
Contra Alexander on the Bitter Lesson and IQ
Andrew Keenan Richardson
Jul 26, 2023, 12:07 AM
9
points
1
comment
4
min read
LW
link
(mechanisticmind.com)
Overcoming the MWC
Mark Freed
Jul 25, 2023, 5:31 PM
3
points
0
comments
3
min read
LW
link
Russian parliamentarian: let’s ban personal computers and the Internet
RomanS
Jul 25, 2023, 5:30 PM
11
points
6
comments
2
min read
LW
link
AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer
Corin Katzke
and
Dan H
Jul 25, 2023, 4:58 PM
6
points
0
comments
6
min read
LW
link
(newsletter.safe.ai)
“The Universe of Minds”—call for reviewers (Seeds of Science)
rogersbacon
Jul 25, 2023, 4:53 PM
7
points
0
comments
1
min read
LW
link
Thoughts on Loss Landscapes and why Deep Learning works
beren
Jul 25, 2023, 4:41 PM
53
points
4
comments
18
min read
LW
link
Should you work at a leading AI lab? (including in non-safety roles)
Benjamin Hilton
Jul 25, 2023, 4:29 PM
7
points
0
comments
12
min read
LW
link
Whisper’s Word-Level Timestamps are Out
Varshul Gupta
Jul 25, 2023, 2:32 PM
−18
points
2
comments
2
min read
LW
link
(dubverseblack.substack.com)
AIS 101: Task decomposition for scalable oversight
Charbel-Raphaël
Jul 25, 2023, 1:34 PM
35
points
0
comments
19
min read
LW
link
(docs.google.com)
Anthropic Observations
Zvi
Jul 25, 2023, 12:50 PM
104
points
1
comment
10
min read
LW
link
(thezvi.wordpress.com)
Autonomous Alignment Oversight Framework (AAOF)
Justausername
Jul 25, 2023, 10:25 AM
−9
points
0
comments
4
min read
LW
link
How LLMs are and are not myopic
janus
Jul 25, 2023, 2:19 AM
135
points
16
comments
8
min read
LW
link
Secure Hand Holding
jefftk
Jul 25, 2023, 1:40 AM
28
points
43
comments
1
min read
LW
link
(www.jefftk.com)
Open problems in activation engineering
TurnTrout
,
woog
,
lisathiergart
,
Monte M
and
Ulisse Mini
Jul 24, 2023, 7:46 PM
51
points
2
comments
1
min read
LW
link
(coda.io)
Subdivisions for Useful Distillations?
Sharat Jacob Jacob
Jul 24, 2023, 6:55 PM
9
points
2
comments
2
min read
LW
link
Optimizing For Approval And Disapproval
Thoth Hermes
Jul 24, 2023, 6:46 PM
−1
points
0
comments
12
min read
LW
link
(thothhermes.substack.com)
An Opinionated Guide to Computability and Complexity (Post #0)
Noosphere89
Jul 24, 2023, 5:53 PM
10
points
10
comments
3
min read
LW
link
Slowing down AI progress is an underexplored alignment strategy
Norman Borlaug
Jul 24, 2023, 4:56 PM
42
points
27
comments
5
min read
LW
link
Anticipation in LLMs
derek shiller
Jul 24, 2023, 3:53 PM
6
points
0
comments
13
min read
LW
link
The cone of freedom (or, freedom might only be instrumentally valuable)
dkl9
Jul 24, 2023, 3:38 PM
−10
points
6
comments
2
min read
LW
link
(dkl9.net)
A reformulation of Finite Factored Sets
Matthias G. Mayer
Jul 24, 2023, 1:02 PM
76
points
1
comment
8
min read
LW
link
Brain Efficiency Cannell Prize Contest Award Ceremony
Alexander Gietelink Oldenziel
24 Jul 2023 11:30 UTC
149
points
12
comments
7
min read
LW
link
[Crosspost] An AI Pause Is Humanity’s Best Bet For Preventing Extinction (TIME)
otto.barten
24 Jul 2023 10:07 UTC
12
points
0
comments
7
min read
LW
link
(time.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel