Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Meta-Technicalities: Safeguarding Values in Formal Systems
LTM
Apr 30, 2025, 11:43 PM
2
points
0
comments
3
min read
LW
link
(routecause.substack.com)
Obstacles in ARC’s agenda: Finding explanations
David Matolcsi
Apr 30, 2025, 11:03 PM
122
points
10
comments
17
min read
LW
link
GPT-4o Responds to Negative Feedback
Zvi
Apr 30, 2025, 8:20 PM
45
points
2
comments
18
min read
LW
link
(thezvi.wordpress.com)
State of play of AI progress (and related brakes on an intelligence explosion) [Linkpost]
Noosphere89
Apr 30, 2025, 7:58 PM
7
points
0
comments
5
min read
LW
link
(www.interconnects.ai)
Don’t accuse your interlocutor of being insufficiently truth-seeking
TFD
Apr 30, 2025, 7:38 PM
30
points
15
comments
2
min read
LW
link
(www.thefloatingdroid.com)
How can we solve diffuse threats like research sabotage with AI control?
Vivek Hebbar
Apr 30, 2025, 7:23 PM
52
points
1
comment
8
min read
LW
link
[Question]
Can Narrowing One’s Reference Class Undermine the Doomsday Argument?
Iannoose n.
Apr 30, 2025, 6:24 PM
2
points
1
comment
1
min read
LW
link
[Question]
Does there exist an interactive reasoning map tool that lets users visually lay out claims, assign probabilities and confidence levels, and dynamically adjust their beliefs based on weighted influences between connected assertions?
Zack Friedman
Apr 30, 2025, 6:22 PM
5
points
4
comments
1
min read
LW
link
Distilling the Internal Model Principle part II
JoseFaustino
Apr 30, 2025, 5:56 PM
15
points
0
comments
19
min read
LW
link
Research Priorities for Hardware-Enabled Mechanisms (HEMs)
aog
Apr 30, 2025, 5:43 PM
17
points
2
comments
15
min read
LW
link
(www.longview.org)
Video and transcript of talk on automating alignment research
Joe Carlsmith
Apr 30, 2025, 5:43 PM
21
points
0
comments
24
min read
LW
link
(joecarlsmith.com)
Can we safely automate alignment research?
Joe Carlsmith
Apr 30, 2025, 5:37 PM
54
points
29
comments
48
min read
LW
link
(joecarlsmith.com)
Investigating task-specific prompts and sparse autoencoders for activation monitoring
Henk Tillman
Apr 30, 2025, 5:09 PM
23
points
0
comments
1
min read
LW
link
(arxiv.org)
European Links (30.04.25)
Martin Sustrik
Apr 30, 2025, 3:40 PM
15
points
1
comment
8
min read
LW
link
(250bpm.substack.com)
Scaling Laws for Scalable Oversight
Subhash Kantamneni
,
Josh Engels
,
David Baek
and
Max Tegmark
Apr 30, 2025, 12:13 PM
30
points
0
comments
9
min read
LW
link
Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis
jeanne_
and
eeeee
Apr 30, 2025, 11:06 AM
211
points
11
comments
11
min read
LW
link
[Paper] Automated Feature Labeling with Token-Space Gradient Descent
Wuschel Schulz
Apr 30, 2025, 10:22 AM
4
points
0
comments
4
min read
LW
link
A single principle related to many Alignment subproblems?
Q Home
Apr 30, 2025, 9:49 AM
37
points
33
comments
17
min read
LW
link
What if Brain Computer Interfaces went exponential?
Stephen Martin
Apr 30, 2025, 5:07 AM
−1
points
0
comments
12
min read
LW
link
Interpreting the METR Time Horizons Post
snewman
Apr 30, 2025, 3:03 AM
66
points
12
comments
10
min read
LW
link
(amistrongeryet.substack.com)
Should we expect the future to be good?
Neil Crawford
Apr 30, 2025, 12:36 AM
15
points
0
comments
14
min read
LW
link
Judging types of consequentialism by influence and normativity
Cole Wyeth
Apr 29, 2025, 11:25 PM
20
points
1
comment
2
min read
LW
link
Bandwidth Rules Everything Around Me: Oliver Habryka on OpenPhil and GoodVentures
Elizabeth
Apr 29, 2025, 8:40 PM
79
points
15
comments
1
min read
LW
link
(acesounderglass.com)
The Grand Encyclopedia of Eponymous Laws
rogersbacon
Apr 29, 2025, 7:30 PM
27
points
5
comments
16
min read
LW
link
(www.secretorum.life)
Misrepresentation as a Barrier for Interp (Part I)
johnswentworth
and
Steve Petersen
Apr 29, 2025, 5:07 PM
113
points
11
comments
7
min read
LW
link
AISN #53: An Open Letter Attempts to Block OpenAI Restructuring
Corin Katzke
and
Dan H
Apr 29, 2025, 4:13 PM
5
points
0
comments
4
min read
LW
link
What could Alphafold 4 look like?
Abhishaike Mahajan
Apr 29, 2025, 3:45 PM
8
points
0
comments
1
min read
LW
link
Sealed Computation: Towards Low-Friction Proof of Locality
Paul Bricman
Apr 29, 2025, 3:26 PM
4
points
0
comments
10
min read
LW
link
(noemaresearch.com)
Dating Roundup #4: An App for That
Zvi
Apr 29, 2025, 1:10 PM
17
points
5
comments
16
min read
LW
link
(thezvi.wordpress.com)
Talk on letters to AI (London)
ukc10014
Apr 29, 2025, 9:50 AM
3
points
0
comments
1
min read
LW
link
Memory Decoding Journal Club: “Motor learning selectively strengthens cortical and striatal synapses of motor engram neurons”
Devin Ward
Apr 29, 2025, 2:26 AM
1
point
0
comments
1
min read
LW
link
D&D.Sci Tax Day: Adventurers and Assessments Evaluation & Ruleset
aphyer
Apr 29, 2025, 2:00 AM
28
points
10
comments
5
min read
LW
link
How to Build a Third Place on Focusmate
Parker Conley
Apr 28, 2025, 11:46 PM
96
points
10
comments
5
min read
LW
link
(parconley.com)
Methods of defense against AGI manipulation
MarkelKori
Apr 28, 2025, 9:03 PM
1
point
0
comments
2
min read
LW
link
China’s Petition System: It Looks Like Democracy — But It Isn’t
Hu Yichao
Apr 28, 2025, 8:56 PM
0
points
4
comments
2
min read
LW
link
Fundamentals of Safe AI (Phase 1) – Applications Open for the Global Cohort
rajsecrets
Apr 28, 2025, 8:52 PM
9
points
0
comments
2
min read
LW
link
Proceedings of ILIAD: Lessons and Progress
Alexander Gietelink Oldenziel
and
JessRiedel
Apr 28, 2025, 7:04 PM
77
points
5
comments
8
min read
LW
link
GPT-4o Is An Absurd Sycophant
Zvi
Apr 28, 2025, 7:00 PM
80
points
7
comments
19
min read
LW
link
(thezvi.wordpress.com)
[Question]
What are the best standardised, repeatable bets?
kave
Apr 28, 2025, 6:45 PM
31
points
10
comments
1
min read
LW
link
7+ tractable directions in AI control
Julian Stastny
and
ryan_greenblatt
Apr 28, 2025, 5:12 PM
86
points
1
comment
13
min read
LW
link
“A victory for the natural order”
Mati_Roy
Apr 28, 2025, 3:33 PM
11
points
3
comments
1
min read
LW
link
(preservinghope.substack.com)
Why giving workers stocks isn’t enough — and what co-ops get right
B Jacobs
Apr 28, 2025, 2:19 PM
6
points
9
comments
2
min read
LW
link
(bobjacobs.substack.com)
Keltham on Becoming more Truth-Oriented
Towards_Keeperhood
Apr 28, 2025, 12:58 PM
15
points
2
comments
19
min read
LW
link
Therapist in the Weights: Risks of Hyper-Introspection in Future AI Systems
Davidmanheim
Apr 28, 2025, 6:42 AM
15
points
1
comment
5
min read
LW
link
In Darkness They Assembled
Charlie Sanders
Apr 28, 2025, 3:44 AM
2
points
0
comments
3
min read
LW
link
Seeking advice on careers in AI Safety
nem
Apr 27, 2025, 11:59 PM
8
points
2
comments
1
min read
LW
link
Thin Alignment Can’t Solve Thick Problems
Daan Henselmans
Apr 27, 2025, 10:42 PM
11
points
2
comments
9
min read
LW
link
The Way You Go Depends A Good Deal On Where You Want To Get: FEP minimizes surprise about actions using preferences about the future as *evidence*
Christopher King
Apr 27, 2025, 9:55 PM
9
points
5
comments
5
min read
LW
link
How people use LLMs
Elizabeth
Apr 27, 2025, 9:48 PM
78
points
6
comments
1
min read
LW
link
(www.gleech.org)
Луна Лавгуд и Комната Тайн, Часть 6
Kongo Landwalker
and
lsusr
Apr 27, 2025, 8:26 PM
3
points
0
comments
2
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel