Mark Xu

Karma: 3,700

I do alignment research at the Alignment Research Center. Learn more about me at markxu.com/about

Strong Evidence is Common

Mark Xu13 Mar 2021 22:04 UTC

244 points

49 comments1 min readLW link 4 reviews

(markxu.com)

The Solomonoff Prior is Malign

Mark Xu14 Oct 2020 1:33 UTC

168 points

52 comments16 min readLW link 3 reviews

An Intuitive Guide to Garrabrant Induction

Mark Xu3 Jun 2021 22:21 UTC

138 points

20 comments24 min readLW link

The First Sample Gives the Most Information

Mark Xu24 Dec 2020 20:39 UTC

133 points

16 comments1 min readLW link 1 review

(markxu.com)

[Question] What are your greatest one-shot life improvements?

Mark Xu16 May 2020 16:53 UTC

114 points

166 comments1 min readLW link

Less Realistic Tales of Doom

Mark Xu6 May 2021 23:01 UTC

113 points

13 comments4 min readLW link

Does SGD Produce Deceptive Alignment?

Mark Xu6 Nov 2020 23:48 UTC

96 points

9 comments16 min readLW link

Mark Xu 25 Jun 2022 20:15 UTC
90 points
37
on: Conversation with Eliezer: What do you want the system to do?
Here’s a conversation that I think is vaguely analogous:

Alice: Suppose we had a one-way function, then we could make passwords better by...

Bob: What do you want your system to do?

Alice: Well, I want passwords to be more robust to...

Bob: Don’t tell me about the mechanics of the system. Tell me what you want the system to do.

Alice: I want people to be able to authenticate their identity more securely?

Bob: But what will they do with this authentication? Will they do good things? Will they do bad things?

Alice: IDK I just think the world is likely to be generically a better place if we can better autheticate users.

Bob: Oh OK, we’re just going to create this user authetication technology and hope people use it for good?

Alice: Yes? And that seems totally reasonable?

It seems to me like you don’t actually have to have a specific story about what you want your AI to do in order for alignment work to be helpful. People in general do not want to die, so probably generic work on being able to more precisely specify what you want out of your AIs, e.g. for them not to be mesa-optimizers, is likely to be helpful.

This is related to complaints I have with [pivotal-act based] framings, but probably that’s a longer post.

How to do theoretical research, a personal perspective

Mark Xu19 Aug 2022 19:41 UTC

87 points

6 comments15 min readLW link

Intermittent Distillations #4: Semiconductors, Economics, Intelligence, and Technological Progress.

Mark Xu8 Jul 2021 22:14 UTC

81 points

9 comments10 min readLW link

Rogue AGI Embodies Valuable Intellectual Property

Mark Xu and CarlShulman

3 Jun 2021 20:37 UTC

71 points

9 comments3 min readLW link

Agents Over Cartesian World Models

Mark Xu and evhub

27 Apr 2021 2:06 UTC

66 points

4 comments27 min readLW link

Mark Xu 21 Apr 2023 17:52 UTC
66 points
37
on: Should we publish mechanistic interpretability research?
Naively there are so few people working on interp, and so many people working on capabilities, that publishing is so good for relative progress. So you need a pretty strong argument that interp in particular is good for capabilities, which isn’t borne out empirically and also doesn’t seem that strong.

In general, this post feels like it’s listing a bunch of considerations that are pretty small, and the 1st order consideration is just like “do you want people to know about this interpretability work”, which seems like a relatively straightfoward “yes”.

I also seperately think that LW tends to reward people for being “capabilities cautious” more than is reasonable, and once you’ve made the decision to not specifically work towards advancing capabilities, then the capabilities externalities of your research probably don’t matter ex ante.

Open Problems with Myopia

Mark Xu and evhub

10 Mar 2021 18:38 UTC

65 points

16 comments8 min readLW link

ELK First Round Contest Winners

Mark Xu and paulfchristiano

26 Jan 2022 2:56 UTC

65 points

6 comments1 min readLW link

Your Time Might Be More Valuable Than You Think

Mark Xu18 Oct 2021 0:55 UTC

56 points

10 comments6 min readLW link

(markxu.com)

Fractional progress estimates for AI timelines and implied resource requirements

Mark Xu and CarlShulman

15 Jul 2021 18:43 UTC

55 points

6 comments7 min readLW link

Defusing AGI Danger

Mark Xu24 Dec 2020 22:58 UTC

48 points

9 comments9 min readLW link

[Question] What posts do you want written?

Mark Xu19 Oct 2020 3:00 UTC

47 points

41 comments1 min readLW link

Training Regime Day 19: Hamming Questions for Potted Plants

Mark Xu23 Apr 2020 16:00 UTC

47 points

1 comment3 min readLW link