Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Computational Mechanics Hackathon (June 1 & 2)
Adam Shai
May 24, 2024, 10:18 PM
34
points
5
comments
1
min read
LW
link
[Question]
Request for comments/opinions/ideas on safety/ethics for use of tool AI in a large healthcare system.
bokov
May 24, 2024, 8:53 PM
5
points
2
comments
1
min read
LW
link
NYU Code Debates Update/Postmortem
David Rein
May 24, 2024, 4:08 PM
27
points
4
comments
10
min read
LW
link
AI companies aren’t really using external evaluators
Zach Stein-Perlman
May 24, 2024, 4:01 PM
242
points
15
comments
4
min read
LW
link
The Schumer Report on AI (RTFB)
Zvi
May 24, 2024, 3:10 PM
34
points
3
comments
36
min read
LW
link
(thezvi.wordpress.com)
minutes from a human-alignment meeting
bhauth
May 24, 2024, 5:01 AM
67
points
4
comments
2
min read
LW
link
Talent Needs of Technical AI Safety Teams
yams
,
Carson Jones
,
McKennaFitzgerald
and
Ryan Kidd
May 24, 2024, 12:36 AM
118
points
65
comments
14
min read
LW
link
How to Give Coming AGI’s the Best Chance of Figuring Out Ethics for Us
sweenesm
May 23, 2024, 7:44 PM
1
point
2
comments
10
min read
LW
link
Mentorship in AGI Safety (MAGIS) call for mentors
Valentin2026
and
Joe Rogero
May 23, 2024, 6:28 PM
31
points
3
comments
2
min read
LW
link
Quick Thoughts on Scaling Monosemanticity
Joel Burget
May 23, 2024, 4:22 PM
28
points
1
comment
4
min read
LW
link
(transformer-circuits.pub)
The case for stopping AI safety research
catubc
May 23, 2024, 3:55 PM
53
points
38
comments
1
min read
LW
link
[Question]
SAE sparse feature graph using only residual layers
Jaehyuk Lim
May 23, 2024, 1:32 PM
0
points
3
comments
1
min read
LW
link
[Question]
Are most people deeply confused about “love”, or am I missing a human universal?
SpectrumDT
May 23, 2024, 1:22 PM
13
points
28
comments
3
min read
LW
link
Executive Dysfunction 101
DaystarEld
May 23, 2024, 12:43 PM
28
points
1
comment
3
min read
LW
link
(daystareld.com)
AI #65: I Spy With My AI
Zvi
May 23, 2024, 12:40 PM
28
points
7
comments
43
min read
LW
link
(thezvi.wordpress.com)
What mistakes has the AI safety movement made?
EuanMcLean
May 23, 2024, 11:19 AM
64
points
29
comments
12
min read
LW
link
What should AI safety be trying to achieve?
EuanMcLean
May 23, 2024, 11:17 AM
17
points
1
comment
13
min read
LW
link
What will the first human-level AI look like, and how might things go wrong?
EuanMcLean
May 23, 2024, 11:17 AM
20
points
2
comments
15
min read
LW
link
Big Picture AI Safety: Introduction
EuanMcLean
May 23, 2024, 11:15 AM
46
points
7
comments
5
min read
LW
link
Paper in Science: Managing extreme AI risks amid rapid progress
JanB
May 23, 2024, 8:40 AM
50
points
2
comments
1
min read
LW
link
Power Law Policy
Ben Turtel
May 23, 2024, 5:28 AM
4
points
7
comments
6
min read
LW
link
(bturtel.substack.com)
Why entropy means you might not have to worry as much about superintelligent AI
Ron J
May 23, 2024, 3:52 AM
−26
points
1
comment
2
min read
LW
link
Quick Thoughts on Our First Sampling Run
jefftk
May 23, 2024, 12:20 AM
29
points
3
comments
2
min read
LW
link
(www.jefftk.com)
AI Safety proposal—Influencing the superintelligence explosion
Morgan
May 22, 2024, 11:31 PM
0
points
2
comments
7
min read
LW
link
Implementing Asimov’s Laws of Robotics—How I imagine alignment working.
Joshua Clancy
May 22, 2024, 11:15 PM
2
points
0
comments
11
min read
LW
link
Higher-Order Forecasts
ozziegooen
May 22, 2024, 9:49 PM
45
points
1
comment
LW
link
A Positive Double Standard—Self-Help Principles Work For Individuals Not Populations
James Stephen Brown
May 22, 2024, 9:37 PM
8
points
3
comments
5
min read
LW
link
A Bi-Modal Brain Model
Johannes C. Mayer
May 22, 2024, 8:10 PM
12
points
3
comments
2
min read
LW
link
Offering service as a sensayer for simulationist-adjacent beliefs.
mako yass
May 22, 2024, 6:52 PM
22
points
0
comments
1
min read
LW
link
Do Not Mess With Scarlett Johansson
Zvi
May 22, 2024, 3:10 PM
65
points
7
comments
16
min read
LW
link
(thezvi.wordpress.com)
How Multiverse Theory dissolves Quantum inexplicability
mrdlm
May 22, 2024, 2:55 PM
0
points
0
comments
1
min read
LW
link
[Question]
Should we be concerned about eating too much soy?
ChristianKl
May 22, 2024, 12:53 PM
18
points
3
comments
1
min read
LW
link
Procedural Executive Function, Part 3
DaystarEld
May 22, 2024, 11:58 AM
20
points
4
comments
LW
link
Cicadas, Anthropic, and the bilateral alignment problem
kromem
May 22, 2024, 11:09 AM
28
points
6
comments
5
min read
LW
link
Announcing Human-aligned AI Summer School
Jan_Kulveit
and
Tomáš Gavenčiak
May 22, 2024, 8:55 AM
50
points
0
comments
1
min read
LW
link
(humanaligned.ai)
“Which chains-of-thought was that faster than?”
Emrik
May 22, 2024, 8:21 AM
37
points
4
comments
4
min read
LW
link
Each Llama3-8b text uses a different “random” subspace of the activation space
tailcalled
May 22, 2024, 7:31 AM
3
points
4
comments
7
min read
LW
link
ARIA’s Safeguarded AI grant program is accepting applications for Technical Area 1.1 until May 28th
Brendon_Wong
May 22, 2024, 6:54 AM
11
points
0
comments
1
min read
LW
link
(www.aria.org.uk)
Anthropic announces interpretability advances. How much does this advance alignment?
Seth Herd
May 21, 2024, 10:30 PM
49
points
4
comments
3
min read
LW
link
(www.anthropic.com)
[Question]
What would stop you from paying for an LLM?
yanni kyriacos
May 21, 2024, 10:25 PM
17
points
15
comments
1
min read
LW
link
EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
scasper
May 21, 2024, 8:15 PM
157
points
16
comments
3
min read
LW
link
Mitigating extreme AI risks amid rapid progress [Linkpost]
Orpheus16
May 21, 2024, 7:59 PM
21
points
7
comments
4
min read
LW
link
The problem with rationality
David Loomis
May 21, 2024, 6:49 PM
−17
points
1
comment
6
min read
LW
link
rough draft on what happens in the brain when you have an insight
Emrik
21 May 2024 18:02 UTC
11
points
2
comments
1
min read
LW
link
On Dwarkesh’s Podcast with OpenAI’s John Schulman
Zvi
21 May 2024 17:30 UTC
73
points
4
comments
20
min read
LW
link
(thezvi.wordpress.com)
[Question]
Is deleting capabilities still a relevant research question?
tailcalled
21 May 2024 13:24 UTC
15
points
1
comment
1
min read
LW
link
New voluntary commitments (AI Seoul Summit)
Zach Stein-Perlman
21 May 2024 11:00 UTC
81
points
17
comments
7
min read
LW
link
(www.gov.uk)
ACX/LW/EA/* Meetup Bremen
RasmusHB
21 May 2024 5:42 UTC
2
points
0
comments
1
min read
LW
link
My Dating Heuristic
Declan Molony
21 May 2024 5:28 UTC
26
points
4
comments
2
min read
LW
link
Scorable Functions: A Format for Algorithmic Forecasting
ozziegooen
21 May 2024 4:14 UTC
29
points
0
comments
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel