Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Page
5
The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers
Dec 12, 2023, 2:42 AM
161
points
34
comments
5
min read
LW
link
AI doom from an LLM-plateau-ist perspective
Steven Byrnes
Apr 27, 2023, 1:58 PM
161
points
24
comments
6
min read
LW
link
Meta Questions about Metaphilosophy
Wei Dai
Sep 1, 2023, 1:17 AM
161
points
80
comments
3
min read
LW
link
Change my mind: Veganism entails trade-offs, and health is one of the axes
Elizabeth
Jun 1, 2023, 5:10 PM
160
points
85
comments
19
min read
LW
link
2
reviews
(acesounderglass.com)
Jailbreaking GPT-4′s code interpreter
Nikola Jurkovic
Jul 13, 2023, 6:43 PM
160
points
22
comments
7
min read
LW
link
Agentized LLMs will change the alignment landscape
Seth Herd
Apr 9, 2023, 2:29 AM
160
points
102
comments
3
min read
LW
link
1
review
“Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
titotal
Sep 29, 2023, 2:01 PM
160
points
79
comments
LW
link
(titotal.substack.com)
Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs
,
Hoagy
,
Aidan Ewart
and
Robert_AIZI
Sep 21, 2023, 3:30 PM
159
points
8
comments
5
min read
LW
link
Vote on Interesting Disagreements
Ben Pace
Nov 7, 2023, 9:35 PM
159
points
131
comments
1
min read
LW
link
Most People Don’t Realize We Have No Idea How Our AIs Work
Thane Ruthenis
Dec 21, 2023, 8:02 PM
159
points
42
comments
1
min read
LW
link
Succession
Richard_Ngo
Dec 20, 2023, 7:25 PM
159
points
48
comments
11
min read
LW
link
(www.narrativeark.xyz)
POC || GTFO culture as partial antidote to alignment wordcelism
lc
Mar 15, 2023, 10:21 AM
158
points
15
comments
7
min read
LW
link
2
reviews
Big Mac Subsidy?
jefftk
Feb 23, 2023, 4:00 AM
158
points
25
comments
2
min read
LW
link
(www.jefftk.com)
What would a compute monitoring plan look like? [Linkpost]
Orpheus16
Mar 26, 2023, 7:33 PM
158
points
10
comments
4
min read
LW
link
(arxiv.org)
Inside the mind of a superhuman Go model: How does Leela Zero read ladders?
Haoxing Du
Mar 1, 2023, 1:47 AM
157
points
8
comments
30
min read
LW
link
My thoughts on the social response to AI risk
Matthew Barnett
Nov 1, 2023, 9:17 PM
157
points
37
comments
10
min read
LW
link
Password-locked models: a stress case for capabilities evaluation
Fabien Roger
Aug 3, 2023, 2:53 PM
156
points
14
comments
6
min read
LW
link
grey goo is unlikely
bhauth
Apr 17, 2023, 1:59 AM
156
points
123
comments
9
min read
LW
link
2
reviews
(bhauth.com)
Sapir-Whorf for Rationalists
Duncan Sabien (Inactive)
Jan 25, 2023, 7:58 AM
155
points
49
comments
19
min read
LW
link
Conjecture internal survey: AGI timelines and probability of human extinction from advanced AI
Maris Sala
May 22, 2023, 2:31 PM
155
points
5
comments
3
min read
LW
link
(www.conjecture.dev)
Announcing Dialogues
Ben Pace
Oct 7, 2023, 2:57 AM
155
points
59
comments
4
min read
LW
link
AI: Practical Advice for the Worried
Zvi
Mar 1, 2023, 12:30 PM
155
points
49
comments
16
min read
LW
link
2
reviews
(thezvi.wordpress.com)
The self-unalignment problem
Jan_Kulveit
and
rosehadshar
Apr 14, 2023, 12:10 PM
155
points
24
comments
10
min read
LW
link
Request: stop advancing AI capabilities
So8res
May 26, 2023, 5:42 PM
154
points
24
comments
1
min read
LW
link
A freshman year during the AI midgame: my approach to the next year
Buck
Apr 14, 2023, 12:38 AM
154
points
15
comments
LW
link
1
review
Will no one rid me of this turbulent pest?
Metacelsus
Oct 14, 2023, 3:27 PM
154
points
23
comments
10
min read
LW
link
(denovo.substack.com)
ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks
Beth Barnes
Aug 1, 2023, 6:30 PM
153
points
12
comments
5
min read
LW
link
(evals.alignment.org)
Assume Bad Faith
Zack_M_Davis
Aug 25, 2023, 5:36 PM
153
points
63
comments
7
min read
LW
link
3
reviews
The Plan − 2023 Version
johnswentworth
Dec 29, 2023, 11:34 PM
152
points
40
comments
31
min read
LW
link
1
review
Shutting down AI is not enough. We need to destroy all technology.
Matthew Barnett
Apr 1, 2023, 9:03 PM
152
points
36
comments
1
min read
LW
link
LoRA Fine-tuning Efficiently Undoes Safety Training from Llama 2-Chat 70B
Simon Lermen
and
Jeffrey Ladish
Oct 12, 2023, 7:58 PM
151
points
29
comments
14
min read
LW
link
GPT-4
nz
Mar 14, 2023, 5:02 PM
151
points
150
comments
1
min read
LW
link
(openai.com)
AI x-risk, approximately ordered by embarrassment
Alex Lawsen
Apr 12, 2023, 11:01 PM
151
points
7
comments
19
min read
LW
link
Why Not Just Outsource Alignment Research To An AI?
johnswentworth
Mar 9, 2023, 9:49 PM
151
points
50
comments
9
min read
LW
link
1
review
Advice for newly busy people
Severin T. Seehrich
May 11, 2023, 4:46 PM
150
points
3
comments
5
min read
LW
link
OpenAI Launches Superalignment Taskforce
Zvi
Jul 11, 2023, 1:00 PM
150
points
40
comments
49
min read
LW
link
(thezvi.wordpress.com)
Why I’m not into the Free Energy Principle
Steven Byrnes
Mar 2, 2023, 7:27 PM
150
points
50
comments
9
min read
LW
link
1
review
There are no coherence theorems
Dan H
and
EJT
Feb 20, 2023, 9:25 PM
149
points
130
comments
19
min read
LW
link
1
review
Moral Reality Check (a short story)
jessicata
Nov 26, 2023, 5:03 AM
149
points
45
comments
21
min read
LW
link
1
review
(unstableontology.com)
The U.S. is becoming less stable
lc
Aug 18, 2023, 9:13 PM
149
points
68
comments
2
min read
LW
link
Dan Luu on “You can only communicate one top priority”
Raemon
Mar 18, 2023, 6:55 PM
149
points
18
comments
3
min read
LW
link
(twitter.com)
Brain Efficiency Cannell Prize Contest Award Ceremony
Alexander Gietelink Oldenziel
Jul 24, 2023, 11:30 AM
149
points
12
comments
7
min read
LW
link
Comments on OpenAI’s “Planning for AGI and beyond”
So8res
Mar 3, 2023, 11:01 PM
148
points
2
comments
14
min read
LW
link
At 87, Pearl is still able to change his mind
rotatingpaguro
18 Oct 2023 4:46 UTC
148
points
15
comments
5
min read
LW
link
Could a superintelligence deduce general relativity from a falling apple? An investigation
titotal
23 Apr 2023 12:49 UTC
148
points
39
comments
9
min read
LW
link
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Seb Farquhar
,
Vikrant Varma
,
zac_kenton
,
gasteigerjo
,
Vlad Mikulik
and
Rohin Shah
18 Dec 2023 11:58 UTC
147
points
21
comments
10
min read
LW
link
6 non-obvious mental health issues specific to AI safety
Igor Ivanov
18 Aug 2023 15:46 UTC
147
points
24
comments
4
min read
LW
link
“Heretical Thoughts on AI” by Eli Dourado
DragonGod
19 Jan 2023 16:11 UTC
146
points
38
comments
3
min read
LW
link
(www.elidourado.com)
Does davidad’s uploading moonshot work?
Bird Concept
,
lisathiergart
,
Anders_Sandberg
,
davidad
and
Arenamontanus
3 Nov 2023 2:21 UTC
146
points
35
comments
25
min read
LW
link
Algorithmic Improvement Is Probably Faster Than Scaling Now
johnswentworth
6 Jun 2023 2:57 UTC
146
points
25
comments
2
min read
LW
link
Back to first
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel