Archive
Sequences
About
Search
Log In
Home
Featured
All
Tags
Recent
Comments
Questions
Events
Shortform
Alignment Forum
AF Comments
RSS
New
Hot
Active
Old
Page
2
Formalizing Embeddedness Failures in Universal Artificial Intelligence
Cole Wyeth
May 26, 2025, 12:36 PM
39
points
0
comments
1
min read
LW
link
(arxiv.org)
Reward button alignment
Steven Byrnes
May 22, 2025, 5:36 PM
50
points
15
comments
12
min read
LW
link
Unexploitable search: blocking malicious use of free parameters
Jacob Pfau
and
Geoffrey Irving
May 21, 2025, 5:23 PM
34
points
16
comments
6
min read
LW
link
Modeling versus Implementation
Cole Wyeth
May 18, 2025, 1:38 PM
27
points
10
comments
3
min read
LW
link
Proof Section to an Introduction to Reinforcement Learning for Understanding Infra-Bayesianism
Brittany Gelb
May 17, 2025, 2:36 AM
3
points
0
comments
9
min read
LW
link
An Introduction to Reinforcement Learning for Understanding Infra-Bayesianism
Brittany Gelb
May 17, 2025, 2:34 AM
11
points
0
comments
20
min read
LW
link
It Is Untenable That Near-Future AI Scenario Models Like “AI 2027” Don’t Include Open Source AI
Andrew Dickson
May 16, 2025, 2:20 AM
36
points
17
comments
5
min read
LW
link
Problems with instruction-following as an alignment target
Seth Herd
May 15, 2025, 3:41 PM
48
points
14
comments
10
min read
LW
link
What if Agent-4 breaks out?
Alvin Ånestrand
May 15, 2025, 9:15 AM
10
points
0
comments
6
min read
LW
link
Dodging systematic human errors in scalable oversight
Geoffrey Irving
May 14, 2025, 3:19 PM
33
points
3
comments
4
min read
LW
link
Working through a small tiling result
James Payor
May 13, 2025, 8:28 PM
66
points
9
comments
5
min read
LW
link
Measuring Schelling Coordination—Reflections on Subversion Strategy Eval
Graeme Ford
May 12, 2025, 7:06 PM
5
points
0
comments
8
min read
LW
link
Political sycophancy as a model organism of scheming
Alex Mallen
and
Vivek Hebbar
May 12, 2025, 5:49 PM
39
points
0
comments
14
min read
LW
link
AIs at the current capability level may be important for future safety work
ryan_greenblatt
May 12, 2025, 2:06 PM
81
points
2
comments
4
min read
LW
link
Highly Opinionated Advice on How to Write ML Papers
Neel Nanda
May 12, 2025, 1:59 AM
60
points
4
comments
32
min read
LW
link
Absolute Zero: Alpha Zero for LLM
alapmi
May 11, 2025, 8:42 PM
23
points
16
comments
1
min read
LW
link
Glass box learners want to be black box
Cole Wyeth
May 10, 2025, 11:05 AM
46
points
10
comments
4
min read
LW
link
Mind the Coherence Gap: Lessons from Steering Llama with Goodfire
eitan sprejer
May 9, 2025, 9:29 PM
4
points
1
comment
6
min read
LW
link
Slow corporations as an intuition pump for AI R&D automation
ryan_greenblatt
and
elifland
9 May 2025 14:49 UTC
91
points
23
comments
9
min read
LW
link
Video & transcript: Challenges for Safe & Beneficial Brain-Like AGI
Steven Byrnes
8 May 2025 21:11 UTC
24
points
0
comments
18
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel