Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Mark Xu
Karma:
4,484
I do alignment research at the Alignment Research Center. Learn more about me at
markxu.com/about
All
Posts
Comments
New
Top
Old
Page
1
Estimating Tail Risk in Neural Networks
Mark Xu
Sep 13, 2024, 8:00 PM
68
points
9
comments
23
min read
LW
link
(www.alignment.org)
Backdoors as an analogy for deceptive alignment
Jacob_Hilton
and
Mark Xu
Sep 6, 2024, 3:30 PM
104
points
2
comments
8
min read
LW
link
(www.alignment.org)
If you weren’t such an idiot...
kave
and
Mark Xu
Mar 2, 2024, 12:01 AM
157
points
74
comments
2
min read
LW
link
(markxu.com)
ARC is hiring theoretical researchers
paulfchristiano
,
Jacob_Hilton
and
Mark Xu
Jun 12, 2023, 6:50 PM
126
points
12
comments
4
min read
LW
link
(www.alignment.org)
How to do theoretical research, a personal perspective
Mark Xu
Aug 19, 2022, 7:41 PM
91
points
6
comments
15
min read
LW
link
ELK prize results
paulfchristiano
and
Mark Xu
Mar 9, 2022, 12:01 AM
138
points
50
comments
21
min read
LW
link
ELK First Round Contest Winners
Mark Xu
and
paulfchristiano
Jan 26, 2022, 2:56 AM
65
points
6
comments
1
min read
LW
link
ARC’s first technical report: Eliciting Latent Knowledge
paulfchristiano
,
Mark Xu
and
Ajeya Cotra
Dec 14, 2021, 8:09 PM
228
points
90
comments
1
min read
LW
link
3
reviews
(docs.google.com)
ARC is hiring!
paulfchristiano
and
Mark Xu
Dec 14, 2021, 8:09 PM
64
points
2
comments
1
min read
LW
link
Your Time Might Be More Valuable Than You Think
Mark Xu
Oct 18, 2021, 12:55 AM
57
points
10
comments
6
min read
LW
link
(markxu.com)
The Simulation Hypothesis Undercuts the SIA/Great Filter Doomsday Argument
Mark Xu
and
CarlShulman
Oct 1, 2021, 10:23 PM
43
points
11
comments
7
min read
LW
link
Fractional progress estimates for AI timelines and implied resource requirements
Mark Xu
and
CarlShulman
Jul 15, 2021, 6:43 PM
55
points
6
comments
7
min read
LW
link
Intermittent Distillations #4: Semiconductors, Economics, Intelligence, and Technological Progress.
Mark Xu
Jul 8, 2021, 10:14 PM
81
points
9
comments
10
min read
LW
link
Anthropic Effects in Estimating Evolution Difficulty
Mark Xu
Jul 5, 2021, 4:02 AM
13
points
2
comments
3
min read
LW
link
An Intuitive Guide to Garrabrant Induction
Mark Xu
Jun 3, 2021, 10:21 PM
149
points
20
comments
24
min read
LW
link
Rogue AGI Embodies Valuable Intellectual Property
Mark Xu
and
CarlShulman
Jun 3, 2021, 8:37 PM
71
points
9
comments
3
min read
LW
link
Intermittent Distillations #3
Mark Xu
May 15, 2021, 7:13 AM
21
points
1
comment
11
min read
LW
link
Pre-Training + Fine-Tuning Favors Deception
Mark Xu
May 8, 2021, 6:36 PM
27
points
3
comments
3
min read
LW
link
Less Realistic Tales of Doom
Mark Xu
May 6, 2021, 11:01 PM
113
points
13
comments
4
min read
LW
link
Agents Over Cartesian World Models
Mark Xu
and
evhub
Apr 27, 2021, 2:06 AM
67
points
4
comments
27
min read
LW
link
Back to top
Next