Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
mattmacdermott
Karma:
1,184
All
Posts
Comments
New
Top
Old
Is instrumental convergence a thing for virtue-driven agents?
mattmacdermott
Apr 2, 2025, 3:59 AM
33
points
37
comments
2
min read
LW
link
Validating against a misalignment detector is very different to training against one
mattmacdermott
Mar 4, 2025, 3:41 PM
29
points
4
comments
4
min read
LW
link
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Yoshua Bengio
,
Jesse Richardson
,
dwk
and
mattmacdermott
Feb 24, 2025, 6:31 PM
44
points
15
comments
11
min read
LW
link
Context-dependent consequentialism
Jeremy Gillen
and
mattmacdermott
Nov 4, 2024, 9:29 AM
31
points
6
comments
27
min read
LW
link
Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024)
mattmacdermott
Sep 1, 2024, 7:46 AM
26
points
0
comments
5
min read
LW
link
(yoshuabengio.org)
Bengio’s Alignment Proposal: “Towards a Cautious Scientist AI with Convergent Safety Bounds”
mattmacdermott
Feb 29, 2024, 1:59 PM
76
points
19
comments
14
min read
LW
link
(yoshuabengio.org)
mattmacdermott’s Shortform
mattmacdermott
Jan 3, 2024, 9:08 AM
4
points
32
comments
LW
link
What’s next for the field of Agent Foundations?
Nora_Ammann
,
Alexander Gietelink Oldenziel
and
mattmacdermott
Nov 30, 2023, 5:55 PM
59
points
23
comments
10
min read
LW
link
Optimisation Measures: Desiderata, Impossibility, Proposals
mattmacdermott
and
Alexander Gietelink Oldenziel
Aug 7, 2023, 3:52 PM
36
points
9
comments
1
min read
LW
link
Reward Hacking from a Causal Perspective
tom4everitt
,
Francis Rhys Ward
,
sbenthall
,
James Fox
,
mattmacdermott
and
RyanCarey
Jul 21, 2023, 6:27 PM
29
points
6
comments
7
min read
LW
link
Incentives from a causal perspective
tom4everitt
,
James Fox
,
RyanCarey
,
mattmacdermott
,
sbenthall
and
Jonathan Richens
Jul 10, 2023, 5:16 PM
27
points
0
comments
6
min read
LW
link
Agency from a causal perspective
tom4everitt
,
mattmacdermott
,
James Fox
,
Francis Rhys Ward
and
Jonathan Richens
Jun 30, 2023, 5:37 PM
40
points
5
comments
6
min read
LW
link
Introduction to Towards Causal Foundations of Safe AGI
tom4everitt
,
Lewis Hammond
,
Francis Rhys Ward
,
RyanCarey
,
James Fox
,
mattmacdermott
and
sbenthall
Jun 12, 2023, 5:55 PM
67
points
6
comments
4
min read
LW
link
Some Summaries of Agent Foundations Work
mattmacdermott
May 15, 2023, 4:09 PM
62
points
1
comment
13
min read
LW
link
Towards Measures of Optimisation
mattmacdermott
and
Alexander Gietelink Oldenziel
May 12, 2023, 3:29 PM
53
points
37
comments
4
min read
LW
link
Normative vs Descriptive Models of Agency
mattmacdermott
Feb 2, 2023, 8:28 PM
26
points
5
comments
4
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel