Software developer at Spark Wave, working on GuidedTrack.
rmoehn(Richard Möhn)
How I understand the main point:
The goal is to get superhuman performance aligned with human values . How might we achieve this? By learning the human values.Then we can use a perfect planner to find the best actions to align the world with the human values. This will have superhuman performance, because humans’ planning algorithms are not perfect. They don’t always find the best actions to align the world with their values.
How do we learn the human values? By observing human behaviour, ie. their actions in each circumstance. This is modelled as the human policy .
Behaviour is the known outside view of a human, and values+planner is the unknown inside view. We need to learn both the values and the planner such that .
Unfortunately, this equation is underdetermined. We only know . and can vary independently.
Are there differences among the candidates? One thing we could look at is their Kolmogorov complexity. Maybe the true candidate has the lowest complexity. But this is not the case, according to the article.
- 10 Jun 2019 0:49 UTC; 1 point) 's comment on Humans can be assigned any values whatsoever... by (
Oliver from LessWrong just helped me point the accusatory finger at myself. – The plugin Privacy Badger was blocking dropbox.com, so the images couldn’t be loaded.
Man, I’m reading the first volume of The GULAG Archipelago and that talk about murder is just sickening.
My (less eloquent and less informed) take:
Dear Ms. Tam,
I’m one of the readers of Scott Alexander’s blog and I kindly ask you not to publish his real name. He has laid out his rationale in his only remaining blog post and Zvi Mowshovitz has already sent you a much more eloquent appeal than the one I’m writing. No doubt, many other readers of Scott’s blog have sent your their – hopefully polite – opinion about the matter.
I have little to add but the reminder that becoming a public figure makes life difficult. Tim Ferriss wrote about this recently:
https://tim.blog/2020/02/02/reasons-to-not-become-famous/
You ought not to force this on people who neither deserve (through evil deeds) nor want it.
Scott is an honest blogger who wants to keep his peace. Please don’t take it away from him.
Respectfully,
Richard Möhn
Good point about the misaligned skillset.
Relationships to results can take many forms.
Joint works and collaborations, as you say.
Receive feedback on work products and use it to improve them.
Discussion/feedback on research direction.
Moral support and cheering in general.
Or someone who lights a fire under your bum, if that’s what you need.
Access to computing resources if you have a good relationship with a university.
Mentoring.
Quick answers to technical questions if you have access to an expert.
Probably more.
This only lists the receiving side, whereas every good relationship is based on give-and-take. Some people get almost all their results by leveraging their network. Not in a parasitic way – they provide a lot of value by connecting others.
Other people—especially women—love me when I’m a cocky arrogant megalomaniac.
Maybe it just divides people? Average behaviour doesn’t move the liking scale. Cocky arrogant megalomaniac behaviour makes the liking scale swing positive in some people, negative in others. And since you’re in a cocky, arrogant mode, you only notice those who like you.
The airplane example illustrates it, too. I bet a good share of passengers thought, ‘what ****er is delaying the airplane now?’, whereas another share smiled about Gates’ nerve.
If you get things done by making enemies, in the end you don’t get much (good) done. Cf. many of the people you listed.
Studying this with Anki is a waste of time in my opinion. Just execute the instructions three times and you’re good to go. Physical skills are best learned physically.
Aside from that: Strong upvote!
Strange thing about the WHO guide: The nail area/tip of the thumb doesn’t get much friction. Step 7 appears to address the space under the fingernails (which should be short anyway). But at least the lateral half of my thumb tip doesn’t touch anything when I try it. Hm, maybe in step 6.
I think the whole sequence will be very useful for me. And I recommend mentioning in section 1 that we will enter the bugs in a spreadsheet later. If I had known that from section 1, I would have typed the bug list into the computer right away, rather than hand-writing it first and then typing it up. Not a good use of my time…
Stopped updating the Prediced AI alignment event/meeting calendar, as apparently there is no interest in it and I’ve left AI alignment.
It’s a good illustration if you’re optimizing by pushing one variable – running harder. It’s not a good illustration in general. Consider my case, which is analogous to yours:
My overall goals are health, fitness and wellbeing (as I assume yours is). And I lift weights, for example doing Turkish Get-Ups with a kettlebell. I started this with no weights, then increased to 12 kg, 16 kg, 20 kg. So my metric is weight x sets. Whenever I increased weight, I got injuries/pains. – First an irritated muscle under my shoulder blade, then pains around my thoracic spine, then a muscle on the outside of my shoulder that felt like it was getting pulled.
I could have said that I’m pushing myself too hard and decided to stay at the old weight. Instead I turned other knobs: I improved my warm-up, did some mobilizations and fixed my form, even booking a personal trainer/physiotherapist a few times. Similar things happened with every weighted exercise I’m doing. It’s just hard to move correctly. Increasing the load (weight, speed etc.) exposes your faults. Then you fix them.
So optimizing my metrics brought me towards my overall goals, continuing to optimize started to bring me away from them, but still continuing to optimize (by taking different actions) brought me even closer to them: greater load plus more correct movements are part of health, fitness, wellbeing.
The Wirecutter guides, although also affiliate-financed, are good, too:
In the pseudocode, it would make more sense to initialize
A <- Distill(H)
, wouldn’t it? Otherwise, runningAmplify
with the randomly initializedA
in the next step wouldn’t be helpful.- 21 Jun 2020 19:00 UTC; 3 points) 's comment on Iterated Distillation and Amplification by (
These are great suggestions for the thinking part of doing research.
For people who have difficulty with the first part – finding a good problem – I recommend the classic The Craft of Research. It also has practical guidance about writing down your results.
Same with my comment. :-/ Maybe the downvoters want to point out the risk of this turning into some denunciation/witchhunting/revolution eating her own children/cancel culture scenario. I’m worried about these dangers too (which is why I mentioned autoimmune disorders), but didn’t want to turn my comment into an essay exploring pros and cons and risks and benefits and negative attractor states and ways to avoid them.
Of course, I would appreciate some explanation from the downvoters. My policy is to only downvote if I also take the time to comment.
Like ProgramCrafter I neither downvoted nor upvoted your comment.
Updated the Prediced AI alignment event/meeting calendar.
Main changes:
Events in the first half of this year moved to cyberspace due to COVID-19.
Including Web Technical AI Safety Unconference.
The paper submission deadline for WAISE is now public.
Cardboard and plastic: Tottori Prefecture goes low-tech to protect officials from COVID-19
This made my day.
Cheap, low-tech prevention measures.
‘“I hope this system will send out a message that even Tottori, where no infections have been reported yet, is being very vigilant.”’ – Yes!
I want Tottori spirit everywhere.
How about an app that trains you not to touch your face?
Point your phone’s camera or a webcam at yourself while you’re working. The app produces a beep whenever you move your hand near your face.
Technically feasible, I’d say. Someone familiar with iOS/Android computer vision APIs should be able to put it together in a few days.
- 9 Mar 2020 23:43 UTC; 2 points) 's comment on rmoehn’s Shortform by (
Paul R. Cohen: Empirical Methods for Artificial Intelligence (non-fiction) – Great if you want to experiment with ML, but don’t have a supervisor to tell you how to do it.
Svend Åge Madsen: Sæt verden er til (fiction) – To keep my Danish alive. It’s the third Madsen book I’m reading and I like all of them.
The Wall Street Journal (news)
(Added 2019-12-30) Dan Carlin: The End Is Always Near. Apocalyptic Moments, from the Bronze Age Collapse to Nuclear Near Misses (audiobook, non-fiction) – Dan Carlin makes Hardcore History, my favourite podcast. In this book he gives illustrates perspectives on existential risk in his usual style of telling stories of history.
I'm studying Bayesian machine learning, because I want to understand how to make ML systems that notice when they are confused in order to help my reader understand how to make ML systems that will ask the overseer for input when doing otherwise would lead to failure. - More a study project than a research project.
I’ve added specifics. I hope this improves things. If not, feel free to edit it out.
Thanks for pointing out the problems with my question. I see now that I was wrong to combine strong language with no specifics and a concrete target. I would amend it, but then the context for the discussion would be gone.