Software developer at Spark Wave, working on GuidedTrack.
rmoehn(Richard Möhn)
I’m leaving AI alignment – you better stay
Please give your links speaking names!
Predicted AI alignment event/meeting calendar
A cognitive intervention for wrist pain
Which of these five AI alignment research projects ideas are no good?
Twenty-three AI alignment research project definitions
Looking for remote writing partners (for AI alignment research)
[Question] How to deal with a misleading conference talk about AI risk?
How I understand the main point:
The goal is to get superhuman performance aligned with human values . How might we achieve this? By learning the human values.Then we can use a perfect planner to find the best actions to align the world with the human values. This will have superhuman performance, because humans’ planning algorithms are not perfect. They don’t always find the best actions to align the world with their values.
How do we learn the human values? By observing human behaviour, ie. their actions in each circumstance. This is modelled as the human policy .
Behaviour is the known outside view of a human, and values+planner is the unknown inside view. We need to learn both the values and the planner such that .
Unfortunately, this equation is underdetermined. We only know . and can vary independently.
Are there differences among the candidates? One thing we could look at is their Kolmogorov complexity. Maybe the true candidate has the lowest complexity. But this is not the case, according to the article.
- 10 Jun 2019 0:49 UTC; 1 point) 's comment on Humans can be assigned any values whatsoever... by (
Makeshift face touch warner
Factored Cognition with Reflection
Oliver from LessWrong just helped me point the accusatory finger at myself. – The plugin Privacy Badger was blocking dropbox.com, so the images couldn’t be loaded.
Earning money with/for work in AI safety
Usable implementation of IDA available
Man, I’m reading the first volume of The GULAG Archipelago and that talk about murder is just sickening.
Remote AI alignment writing group seeking new members
My (less eloquent and less informed) take:
Dear Ms. Tam,
I’m one of the readers of Scott Alexander’s blog and I kindly ask you not to publish his real name. He has laid out his rationale in his only remaining blog post and Zvi Mowshovitz has already sent you a much more eloquent appeal than the one I’m writing. No doubt, many other readers of Scott’s blog have sent your their – hopefully polite – opinion about the matter.
I have little to add but the reminder that becoming a public figure makes life difficult. Tim Ferriss wrote about this recently:
https://tim.blog/2020/02/02/reasons-to-not-become-famous/
You ought not to force this on people who neither deserve (through evil deeds) nor want it.
Scott is an honest blogger who wants to keep his peace. Please don’t take it away from him.
Respectfully,
Richard Möhn
Good point about the misaligned skillset.
Relationships to results can take many forms.
Joint works and collaborations, as you say.
Receive feedback on work products and use it to improve them.
Discussion/feedback on research direction.
Moral support and cheering in general.
Or someone who lights a fire under your bum, if that’s what you need.
Access to computing resources if you have a good relationship with a university.
Mentoring.
Quick answers to technical questions if you have access to an expert.
Probably more.
This only lists the receiving side, whereas every good relationship is based on give-and-take. Some people get almost all their results by leveraging their network. Not in a parasitic way – they provide a lot of value by connecting others.
Other people—especially women—love me when I’m a cocky arrogant megalomaniac.
Maybe it just divides people? Average behaviour doesn’t move the liking scale. Cocky arrogant megalomaniac behaviour makes the liking scale swing positive in some people, negative in others. And since you’re in a cocky, arrogant mode, you only notice those who like you.
The airplane example illustrates it, too. I bet a good share of passengers thought, ‘what ****er is delaying the airplane now?’, whereas another share smiled about Gates’ nerve.
If you get things done by making enemies, in the end you don’t get much (good) done. Cf. many of the people you listed.
I’ve added specifics. I hope this improves things. If not, feel free to edit it out.
Thanks for pointing out the problems with my question. I see now that I was wrong to combine strong language with no specifics and a concrete target. I would amend it, but then the context for the discussion would be gone.