rmoehn(Richard Möhn)

Karma: 642

Software developer at Spark Wave, working on GuidedTrack.

I’m leaving AI alignment – you better stay

rmoehn12 Mar 2020 5:58 UTC

152 points

19 comments5 min readLW link

Please give your links speaking names!

rmoehn11 Jul 2019 7:47 UTC

44 points

22 comments1 min readLW link

Predicted AI alignment event/meeting calendar

rmoehn14 Aug 2019 7:14 UTC

29 points

14 comments1 min readLW link

A cognitive intervention for wrist pain

rmoehn17 Mar 2019 5:26 UTC

28 points

24 comments6 min readLW link

Which of these five AI alignment research projects ideas are no good?

rmoehn8 Aug 2019 7:17 UTC

25 points

13 comments1 min readLW link

Twenty-three AI alignment research project definitions

rmoehn3 Feb 2020 22:21 UTC

23 points

0 comments6 min readLW link

Looking for remote writing partners (for AI alignment research)

rmoehn1 Oct 2019 2:16 UTC

23 points

4 comments2 min readLW link

[Question] How to deal with a misleading conference talk about AI risk?

rmoehn27 Jun 2019 21:04 UTC

21 points

12 comments4 min readLW link

rmoehn 28 Jun 2019 21:49 UTC
20 points
in reply to: Kaj_Sotala’s comment on: How to deal with a misleading conference talk about AI risk?
I’ve added specifics. I hope this improves things. If not, feel free to edit it out.

Thanks for pointing out the problems with my question. I see now that I was wrong to combine strong language with no specifics and a concrete target. I would amend it, but then the context for the discussion would be gone.

rmoehn 10 Jun 2019 0:58 UTC
20 points
on: Humans can be assigned any values whatsoever…
How I understand the main point:

The goal is to get superhuman performance aligned with human values $R_{h}$ . How might we achieve this? By learning the human values.Then we can use a perfect planner $p^{⋆}$ to find the best actions to align the world with the human values. This will have superhuman performance, because humans’ planning algorithms are not perfect. They don’t always find the best actions to align the world with their values.

How do we learn the human values? By observing human behaviour, ie. their actions in each circumstance. This is modelled as the human policy $π (h)$ .

Behaviour is the known outside view of a human, and values+planner is the unknown inside view. We need to learn both the values and the planner such that $p (R) = π (h)$ .

Unfortunately, this equation is underdetermined. We only know $π (h)$ . $p$ and $R$ can vary independently.

Are there differences among the $(p, R)$ candidates? One thing we could look at is their Kolmogorov complexity. Maybe the true candidate has the lowest complexity. But this is not the case, according to the article.
What links here?
- rmoehn's comment on Humans can be assigned any values whatsoever... by Stuart_Armstrong (10 Jun 2019 0:49 UTC; 1 point)

Makeshift face touch warner

rmoehn18 Mar 2020 9:38 UTC

15 points

4 comments1 min readLW link

Factored Cognition with Reflection

rmoehn6 Apr 2019 10:00 UTC

14 points

1 comment1 min readLW link

rmoehn 18 Jan 2020 1:58 UTC
13 points
in reply to: Stuart_Armstrong’s comment on: ACDT: a hack-y acausal decision theory
Oliver from LessWrong just helped me point the accusatory finger at myself. – The plugin Privacy Badger was blocking dropbox.com, so the images couldn’t be loaded.

Earning money with/for work in AI safety

rmoehn18 Jul 2016 5:37 UTC

12 points

31 comments1 min readLW link

Usable implementation of IDA available

rmoehn29 Feb 2020 8:54 UTC

12 points

0 comments1 min readLW link

rmoehn 9 Sep 2020 4:34 UTC
12 points
on: Escalation Outside the System
Man, I’m reading the first volume of The GULAG Archipelago and that talk about murder is just sickening.

Remote AI alignment writing group seeking new members

rmoehn18 Jan 2020 2:10 UTC

11 points

0 comments1 min readLW link

rmoehn 24 Jun 2020 12:15 UTC
10 points
on: New York Times, Please Do Not Threaten The Safety of Scott Alexander By Revealing His True Name
My (less eloquent and less informed) take:
Dear Ms. Tam,

I’m one of the readers of Scott Alexander’s blog and I kindly ask you not to publish his real name. He has laid out his rationale in his only remaining blog post and Zvi Mowshovitz has already sent you a much more eloquent appeal than the one I’m writing. No doubt, many other readers of Scott’s blog have sent your their – hopefully polite – opinion about the matter.

I have little to add but the reminder that becoming a public figure makes life difficult. Tim Ferriss wrote about this recently:
https://tim.blog/2020/02/02/reasons-to-not-become-famous/
You ought not to force this on people who neither deserve (through evil deeds) nor want it.

Scott is an honest blogger who wants to keep his peace. Please don’t take it away from him.

Respectfully,

Richard Möhn

rmoehn 12 Mar 2020 11:54 UTC
10 points
in reply to: MrMind’s comment on: I’m leaving AI alignment – you better stay
Good point about the misaligned skillset.

Relationships to results can take many forms.
- Joint works and collaborations, as you say.
- Receive feedback on work products and use it to improve them.
- Discussion/feedback on research direction.
- Moral support and cheering in general.
- Or someone who lights a fire under your bum, if that’s what you need.
- Access to computing resources if you have a good relationship with a university.
- Mentoring.
- Quick answers to technical questions if you have access to an expert.
- Probably more.
This only lists the receiving side, whereas every good relationship is based on give-and-take. Some people get almost all their results by leveraging their network. Not in a parasitic way – they provide a lot of value by connecting others.

rmoehn 15 Jan 2021 4:12 UTC
9 points
on: Overconfidence

Other people—especially women—love me when I’m a cocky arrogant megalomaniac.

Maybe it just divides people? Average behaviour doesn’t move the liking scale. Cocky arrogant megalomaniac behaviour makes the liking scale swing positive in some people, negative in others. And since you’re in a cocky, arrogant mode, you only notice those who like you.

The airplane example illustrates it, too. I bet a good share of passengers thought, ‘what ****er is delaying the airplane now?’, whereas another share smiled about Gates’ nerve.

If you get things done by making enemies, in the end you don’t get much (good) done. Cf. many of the people you listed.
What links here?
- Deconditioning Aversion to Dislike by lsusr (15 Jan 2021 7:57 UTC; 14 points)